On-Device Speech AI forApple Silicon

Deploy state-of-the-art speech-to-text and text-to-speech models directly on iOS, macOS, watchOS, and visionOS with real-time streaming, voice activity detection, and more.

Get Started API Reference View on GitHub

Quick start

Get up and running with WhisperKit in minutes

Install via Swift Package Manager

Add WhisperKit to your project by adding the package dependency in Xcode or your Package.swift file:

Package.swift

dependencies: [
    .package(url: "https://github.com/argmaxinc/WhisperKit.git", from: "0.9.0")
]

Then add the products you need as target dependencies:

.target(
    name: "YourApp",
    dependencies: [
        "WhisperKit", // speech-to-text
        "TTSKit",     // text-to-speech
    ]
)

Initialize WhisperKit

Import and initialize WhisperKit in your Swift code. The framework automatically downloads the recommended model for your device:

import WhisperKit

Task {
    let pipe = try? await WhisperKit()
    print("WhisperKit initialized and ready")
}

WhisperKit automatically selects and downloads the optimal model for your device. For custom model selection, see the model selection guide.

Transcribe audio

Use the transcribe method to convert audio files to text:

let transcription = try? await pipe!.transcribe(
    audioPath: "path/to/audio.wav"
)?.text
print(transcription)

WhisperKit supports multiple audio formats including WAV, MP3, M4A, and FLAC.

Generate speech with TTSKit

For text-to-speech, import TTSKit and generate audio from text:

import TTSKit

Task {
    let tts = try await TTSKit()
    let result = try await tts.generate(text: "Hello from TTSKit!")
    
    // Or play directly with streaming
    try await tts.play(text: "Hello from TTSKit!")
}

Core features

Everything you need for on-device speech AI

Speech-to-Text

Deploy Whisper models on-device with real-time streaming transcription and word-level timestamps

Text-to-Speech

Generate natural speech with multiple voices and languages using Qwen3 TTS models

Voice Activity Detection

Automatically detect speech segments with built-in energy-based VAD

Multi-Platform

Run on iOS, macOS, watchOS, and visionOS with optimized Core ML models

Local Server

OpenAI-compatible API for local speech processing with streaming support

Model Management

Automatic model downloading from HuggingFace with custom model support

Explore by topic

Deep dive into WhisperKit capabilities

Real-Time Streaming

Stream audio from microphone with live transcription using AudioStreamTranscriber

Learn more

Streaming Playback

Real-time audio streaming with intelligent buffering strategies for smooth playback

Learn more

Custom Models

Fine-tune and deploy your own Whisper models with whisperkittools

Learn more

Performance Optimization

Optimize inference speed and memory usage with compute unit selection

Learn more

Resources

Additional resources to help you succeed

Model Catalog

Browse available Whisper and TTS models with performance benchmarks

Benchmarks

View performance metrics across different Apple Silicon devices

Contributing

Learn how to contribute to WhisperKit development

Discord Community

Join the community for support and discussions

Ready to get started?

Build powerful on-device speech applications with WhisperKit today

View Quickstart View Examples

Documentation Index

On-Device Speech AI forApple Silicon

Quick start

Core features

Speech-to-Text

Text-to-Speech

Voice Activity Detection

Multi-Platform

Local Server

Model Management

Explore by topic

Real-Time Streaming

Streaming Playback

Custom Models

Performance Optimization

Resources

Model Catalog

Benchmarks

Contributing

Discord Community

Ready to get started?