Model Selection

WhisperKit supports all official OpenAI Whisper model variants, from tiny to large-v3. Choosing the right model involves balancing accuracy, speed, and memory usage based on your application’s requirements.

Available Models

Whisper models come in different sizes, each with multilingual and English-only variants:

Model Variants

Tiny (39M parameters)

Best for: Real-time streaming, constrained devices, quick prototyping

Fastest inference
Lowest memory footprint (~75 MB)
Acceptable accuracy for clear audio
Available: tiny (multilingual), tiny.en (English-only)

let whisperKit = try await WhisperKit(model: "tiny")

Base (74M parameters)

Best for: Mobile apps, moderate accuracy requirements

Good balance of speed and accuracy
Memory footprint ~140 MB
Suitable for most mobile applications
Available: base, base.en

let whisperKit = try await WhisperKit(model: "base")

Small (244M parameters)

Best for: Production applications, higher accuracy needs

Good accuracy for production use
Memory footprint ~460 MB
Slower than base but more accurate
Available: small, small.en

let whisperKit = try await WhisperKit(model: "small")

Medium (769M parameters)

Best for: High accuracy requirements, server-side processing

Very good accuracy
Memory footprint ~1.5 GB
Slower inference
Available: medium, medium.en

let whisperKit = try await WhisperKit(model: "medium")

Large (1550M parameters)

Best for: Maximum accuracy, offline batch processing

Best accuracy
Memory footprint ~3 GB
Slowest inference
Available: large, large-v2, large-v3

let whisperKit = try await WhisperKit(model: "large-v3")

See ModelVariant

ModelVariant Enum

public enum ModelVariant: CustomStringConvertible {
    case tiny
    case tinyEn
    case base
    case baseEn
    case small
    case smallEn
    case medium
    case mediumEn
    case large
    case largev2
    case largev3
    
    var isMultilingual: Bool {
        // Returns true for multilingual models
        // Returns false for .en variants
    }
}

Recommended Models

WhisperKit provides device-specific recommendations:

Get Recommended Models

// Get locally computed recommendations
let localSupport = WhisperKit.recommendedModels()
print("Default model: \(localSupport.default)")
print("Supported models: \(localSupport.supported)")

// Get recommendations from remote config
let remoteSupport = await WhisperKit.recommendedRemoteModels(
    from: "argmaxinc/whisperkit-coreml"
)
print("Recommended: \(remoteSupport.default)")

See WhisperKit.recommendedModels and WhisperKit.recommendedRemoteModels

Device-Specific Recommendations

Recommendations are based on device hardware:

let deviceName = WhisperKit.deviceName()
print("Running on: \(deviceName)")

// Example device identifiers:
// - "iPhone15,2" (iPhone 14 Pro)
// - "iPad13,16" (iPad Pro M2)
// - "Mac14,2" (Mac Studio M2)

See WhisperKit.deviceName

Downloading Models

Automatic Download

By default, WhisperKit downloads models automatically:

// Downloads and loads the default recommended model
let whisperKit = try await WhisperKit()

// Downloads a specific model
let whisperKit = try await WhisperKit(model: "base")

See WhisperKitConfig.download

Manual Download

Download a model without initializing WhisperKit:

let modelFolder = try await WhisperKit.download(
    variant: "large-v3",
    from: "argmaxinc/whisperkit-coreml",
    progressCallback: { progress in
        print("Downloaded: \(progress.fractionCompleted * 100)%")
    }
)

print("Model saved to: \(modelFolder.path)")

See WhisperKit.download

List Available Models

let availableModels = try await WhisperKit.fetchAvailableModels(
    from: "argmaxinc/whisperkit-coreml"
)

print("Available models:")
for model in availableModels {
    print("  - \(model)")
}

See WhisperKit.fetchAvailableModels

Local Models

Use pre-downloaded or bundled models:

// Use a local model folder
let whisperKit = try await WhisperKit(
    modelFolder: "/path/to/model/folder",
    download: false  // Disable automatic download
)

See WhisperKitConfig.modelFolder

Bundle Models in App

// Get bundled model path
guard let modelPath = Bundle.main.path(
    forResource: "openai_whisper-base",
    ofType: nil
) else {
    fatalError("Model not found in bundle")
}

let whisperKit = try await WhisperKit(
    modelFolder: modelPath,
    download: false
)

Bundling large models increases app size significantly. Consider downloading on first launch instead.

Model Repositories

WhisperKit downloads models from Hugging Face repositories:

Default Repository

// Default: argmaxinc/whisperkit-coreml
let whisperKit = try await WhisperKit(model: "base")

Custom Repository

let whisperKit = try await WhisperKit(
    model: "base",
    modelRepo: "your-username/your-repo",
    modelToken: "hf_your_token_here"  // If repo is private
)

See WhisperKitConfig.modelRepo

Custom Endpoint

let config = WhisperKitConfig(
    model: "base",
    modelEndpoint: "https://your-custom-endpoint.com"
)

let whisperKit = try await WhisperKit(config)

See WhisperKitConfig.modelEndpoint

Download Configuration

Background Downloads

Enable background downloads for large models:

let whisperKit = try await WhisperKit(
    model: "large-v3",
    useBackgroundDownloadSession: true
)

See WhisperKitConfig.useBackgroundDownloadSession

Custom Download Location

let customBase = FileManager.default.urls(
    for: .documentDirectory,
    in: .userDomainMask
).first!

let whisperKit = try await WhisperKit(
    model: "base",
    downloadBase: customBase
)

See WhisperKitConfig.downloadBase

Model States and Loading

Prewarming Models

Prewarm models to reduce peak memory usage:

let whisperKit = try await WhisperKit(
    model: "medium",
    prewarm: true  // Load and unload models sequentially
)

See WhisperKitConfig.prewarm

Prewarming loads models one at a time to trigger Core ML specialization without high peak memory. This doubles load time but reduces memory pressure.

Deferred Loading

// Download but don't load models yet
let whisperKit = try await WhisperKit(
    model: "base",
    load: false
)

// Load later when needed
try await whisperKit.loadModels()

See WhisperKitConfig.load

Unload Models

// Free memory when models aren't needed
await whisperKit.unloadModels()

// Reload when needed
try await whisperKit.loadModels()

See WhisperKit.unloadModels

Multilingual vs English-only

When to Use Multilingual Models

Transcribing content in multiple languages
Language is unknown in advance
Need automatic language detection
Translation to English (.translate task)

let whisperKit = try await WhisperKit(model: "base")  // Multilingual

let (language, _) = try await whisperKit.detectLanguage(
    audioPath: "audio.wav"
)
print("Detected: \(language)")

When to Use English-only Models

Only transcribing English audio
Slightly faster inference
Marginally better English accuracy

let whisperKit = try await WhisperKit(model: "base.en")

var options = DecodingOptions(language: "en")
let results = try await whisperKit.transcribe(
    audioPath: "audio.wav",
    decodeOptions: options
)

Model Performance Comparison

Performance varies by device. These are approximate values for reference.

Model	Size	Parameters	Relative Speed	Memory	Accuracy
tiny	75 MB	39M	32x	~150 MB	Good
base	140 MB	74M	16x	~250 MB	Better
small	460 MB	244M	6x	~600 MB	Very Good
medium	1.5 GB	769M	2x	~1.8 GB	Excellent
large-v3	3 GB	1550M	1x	~3.2 GB	Best

Selection Guidelines

Real-time Streaming

Recommended: tiny, baseFast enough to transcribe live audio without lag on most devices.

Mobile Apps

Recommended: base, smallBalance of accuracy and app size. Consider on-demand download instead of bundling.

High Accuracy

Recommended: medium, large-v3Best for offline processing, server deployments, or high-end devices.

Constrained Devices

Recommended: tinyOnly option for devices with limited memory or older hardware.

Model Selection

Model Selection

Available Models

Model Variants

ModelVariant Enum

Recommended Models

Get Recommended Models

Device-Specific Recommendations

Downloading Models

Automatic Download

Manual Download

List Available Models

Local Models

Bundle Models in App

Model Repositories

Default Repository

Custom Repository

Custom Endpoint

Download Configuration

Background Downloads

Custom Download Location

Model States and Loading

Prewarming Models

Deferred Loading

Unload Models

Multilingual vs English-only

When to Use Multilingual Models

When to Use English-only Models

Model Performance Comparison

Selection Guidelines

Real-time Streaming

Mobile Apps

High Accuracy

Constrained Devices

Next Steps

Configuration

Transcription

Documentation Index

​Model Selection

​Available Models

​Model Variants

​ModelVariant Enum

​Recommended Models

​Get Recommended Models

​Device-Specific Recommendations

​Downloading Models

​Automatic Download

​Manual Download

​List Available Models

​Local Models

​Bundle Models in App

​Model Repositories

​Default Repository

​Custom Repository

​Custom Endpoint

​Download Configuration

​Background Downloads

​Custom Download Location

​Model States and Loading

​Prewarming Models

​Deferred Loading

​Unload Models

​Multilingual vs English-only

​When to Use Multilingual Models

​When to Use English-only Models

​Model Performance Comparison

​Selection Guidelines

Real-time Streaming

Mobile Apps

High Accuracy

Constrained Devices

​Next Steps

Configuration

Transcription

Model Selection

Available Models

Model Variants

ModelVariant Enum

Recommended Models

Get Recommended Models

Device-Specific Recommendations

Downloading Models

Automatic Download

Manual Download

List Available Models

Local Models

Bundle Models in App

Model Repositories

Default Repository

Custom Repository

Custom Endpoint

Download Configuration

Background Downloads

Custom Download Location

Model States and Loading

Prewarming Models

Deferred Loading

Unload Models

Multilingual vs English-only

When to Use Multilingual Models

When to Use English-only Models

Model Performance Comparison

Selection Guidelines

Next Steps