Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/argmaxinc/WhisperKit/llms.txt

Use this file to discover all available pages before exploring further.

Overview

GenerationOptions controls all aspects of the speech synthesis pipeline, including sampling parameters, chunking strategy, and concurrency. All fields have sensible defaults, so the zero-argument initializer works for most use cases.
public struct GenerationOptions: Codable, Sendable

Initialization

public init(
    temperature: Float = GenerationOptions.defaultTemperature,
    topK: Int = GenerationOptions.defaultTopK,
    repetitionPenalty: Float = GenerationOptions.defaultRepetitionPenalty,
    maxNewTokens: Int = GenerationOptions.defaultMaxNewTokens,
    concurrentWorkerCount: Int = 0,
    chunkingStrategy: TextChunkingStrategy? = nil,
    targetChunkSize: Int? = nil,
    minChunkSize: Int? = nil,
    instruction: String? = nil,
    forceLegacyEmbedPath: Bool = false
)
temperature
Float
default:"0.9"
Sampling temperature. Higher values (e.g., 1.0) make output more random; lower values (e.g., 0.5) make it more deterministic.
topK
Int
default:"50"
Top-K sampling parameter. Only the K most likely tokens are considered at each step.
repetitionPenalty
Float
default:"1.05"
Repetition penalty to discourage repeating tokens. Values > 1.0 penalize repetition.
maxNewTokens
Int
default:"245"
Maximum number of tokens to generate in the autoregressive loop.
concurrentWorkerCount
Int
default:"0"
Number of concurrent workers for multi-chunk generation:
  • 0: all chunks run concurrently in one batch (default, fastest for non-streaming use cases)
  • 1: sequential - one chunk at a time; required for real-time play streaming
  • N: at most N chunks run concurrently
chunkingStrategy
TextChunkingStrategy?
default:"nil"
How to split long text into chunks. Defaults to .sentence. Set to .none to force a single-pass generation without sentence splitting.
targetChunkSize
Int?
default:"nil"
Target chunk size in tokens for sentence chunking. nil resolves to TextChunker.defaultTargetChunkSize at the call site.
minChunkSize
Int?
default:"nil"
Minimum chunk size in tokens. nil resolves to TextChunker.defaultMinChunkSize at the call site.
instruction
String?
default:"nil"
Optional style instruction for controlling speech characteristics (e.g., "Very happy"). Prepended as a text-only user prompt before the main TTS segment. For Qwen3, this is only supported by the 1.7B model variant.
forceLegacyEmbedPath
Bool
default:"false"
Force the legacy [FloatType] inference path even on macOS 15+ / iOS 18+. When false (default), the MLTensor path is taken on supported OS versions. Set to true in tests to exercise the pre-macOS-15 code path on current hardware.

Properties

Sampling Parameters

temperature
Float
Sampling temperature. Default: 0.9
topK
Int
Top-K sampling parameter. Default: 50
repetitionPenalty
Float
Repetition penalty to discourage repeating tokens. Default: 1.05
maxNewTokens
Int
Maximum number of tokens to generate. Default: 245

Chunking and Concurrency

concurrentWorkerCount
Int
Number of concurrent workers for multi-chunk generation. Default: 0 (all chunks concurrently)
chunkingStrategy
TextChunkingStrategy?
How to split long text into chunks. Default: nil (resolves to .sentence)
targetChunkSize
Int?
Target chunk size in tokens for sentence chunking. Default: nil (uses TextChunker.defaultTargetChunkSize)
minChunkSize
Int?
Minimum chunk size in tokens. Default: nil (uses TextChunker.defaultMinChunkSize)

Style Control

instruction
String?
Optional style instruction for controlling speech characteristics. Default: nilOnly supported by the Qwen3 1.7B model variant.

Advanced

forceLegacyEmbedPath
Bool
Force the legacy [FloatType] inference path. Default: false

Static Properties

defaultTemperature

public static let defaultTemperature: Float = 0.9
value
Float
Default sampling temperature: 0.9

defaultTopK

public static let defaultTopK: Int = 50
value
Int
Default Top-K sampling parameter: 50

defaultRepetitionPenalty

public static let defaultRepetitionPenalty: Float = 1.05
value
Float
Default repetition penalty: 1.05

defaultMaxNewTokens

public static let defaultMaxNewTokens: Int = 245
value
Int
Default maximum number of tokens to generate: 245

Example Usage

Default Options

let result = try await tts.generate(
    text: "Hello, world!",
    voice: "ryan"
)

Custom Sampling

var options = GenerationOptions(
    temperature: 0.7,
    topK: 30,
    maxNewTokens: 500
)
let result = try await tts.generate(
    text: "A longer piece of text.",
    voice: "ryan",
    options: options
)

Sequential Generation (for Streaming)

var options = GenerationOptions(
    concurrentWorkerCount: 1  // Required for play() streaming
)
let result = try await tts.play(
    text: "This will stream audio chunk by chunk.",
    voice: "ryan",
    options: options,
    playbackStrategy: .auto
)

With Style Instruction (1.7B only)

var options = GenerationOptions(
    instruction: "Very happy and excited"
)
let result = try await tts.generate(
    text: "I'm so glad to meet you!",
    voice: "ryan",
    options: options
)

Disable Chunking

var options = GenerationOptions(
    chunkingStrategy: .none
)
let result = try await tts.generate(
    text: "Generate this as a single chunk.",
    voice: "ryan",
    options: options
)

Custom Chunk Sizes

var options = GenerationOptions(
    targetChunkSize: 100,
    minChunkSize: 20
)
let result = try await tts.generate(
    text: "A very long piece of text that will be split into chunks...",
    voice: "ryan",
    options: options
)

Parallel Generation

var options = GenerationOptions(
    concurrentWorkerCount: 4  // Up to 4 chunks in parallel
)
let result = try await tts.generate(
    text: "Long text with multiple sentences. Each sentence becomes a chunk. They generate in parallel.",
    voice: "ryan",
    options: options
)