Use this file to discover all available pages before exploring further.
On-Device Speech AI forApple Silicon
Deploy state-of-the-art speech-to-text and text-to-speech models directly on iOS, macOS, watchOS, and visionOS with real-time streaming, voice activity detection, and more.
Import and initialize WhisperKit in your Swift code. The framework automatically downloads the recommended model for your device:
import WhisperKitTask { let pipe = try? await WhisperKit() print("WhisperKit initialized and ready")}
WhisperKit automatically selects and downloads the optimal model for your device. For custom model selection, see the model selection guide.
3
Transcribe audio
Use the transcribe method to convert audio files to text:
let transcription = try? await pipe!.transcribe( audioPath: "path/to/audio.wav")?.textprint(transcription)
WhisperKit supports multiple audio formats including WAV, MP3, M4A, and FLAC.
4
Generate speech with TTSKit
For text-to-speech, import TTSKit and generate audio from text:
import TTSKitTask { let tts = try await TTSKit() let result = try await tts.generate(text: "Hello from TTSKit!") // Or play directly with streaming try await tts.play(text: "Hello from TTSKit!")}
Core features
Everything you need for on-device speech AI
Speech-to-Text
Deploy Whisper models on-device with real-time streaming transcription and word-level timestamps
Text-to-Speech
Generate natural speech with multiple voices and languages using Qwen3 TTS models
Voice Activity Detection
Automatically detect speech segments with built-in energy-based VAD
Multi-Platform
Run on iOS, macOS, watchOS, and visionOS with optimized Core ML models
Local Server
OpenAI-compatible API for local speech processing with streaming support
Model Management
Automatic model downloading from HuggingFace with custom model support