Dictare / Docs / Speech Engines

Speech Engines¶

Modern speech recognition and synthesis models are powerful but heavy — large downloads, slow to load, and resource-intensive. Every application that wants voice interaction has to implement its own loading, optimization, and lifecycle management. Running a model for a few voice commands and then discarding it is wasteful.

Dictare solves this by keeping STT and TTS engines loaded in memory as a background service. The models are loaded once at startup, optimized for your hardware, and stay ready. Any application can use them instantly through the OpenVIP protocol — no model loading, no ML dependencies, no GPU management. Just speak and listen.

All engines run 100% locally. No audio ever leaves your machine.

Speech-to-Text (STT)¶

Engine Selection¶

Dictare automatically selects the best STT engine for your platform:

Model config	Engine	Runtime	Platform
`tiny` to `large-v3-turbo`	MLXWhisperEngine	MLX	macOS Apple Silicon
`tiny` to `large-v3-turbo`	FasterWhisperEngine	CTranslate2	Linux / Intel Mac
`parakeet-v3`	ParakeetEngine	ONNX Runtime	Any

Models¶

Model	Size	Speed	Accuracy	Best for
`tiny`	~75 MB	Fastest	Lower	Testing, low-resource
`base`	~140 MB	Fast	Moderate	Quick dictation
`small`	~460 MB	Moderate	Good	General use
`medium`	~1.5 GB	Slower	High	Accuracy-focused
`large-v3`	~3 GB	Slowest	Highest	Best accuracy
`large-v3-turbo`	~1.6 GB	Fast	High	Recommended
`parakeet-v3`	~600 MB	Fast	High	Cross-platform

Configuration¶

[stt]
model = "large-v3-turbo"    # Recommended default
language = "auto"            # Auto-detect, or set "en", "it", etc.
translate = false            # Translate to English
hw_accel = true              # Use GPU/NPU acceleration

[stt.advanced]
device = "auto"              # auto, cpu, cuda, mlx
compute_type = "float16"     # int8 (faster), float16 (balanced), float32 (precise)
hotwords = ""                # Help recognition: "Dictare, OpenVIP, Claude"

Model Management¶

dictare models               # List available models
dictare models download      # Download a model
dictare models delete        # Remove a cached model

Hotwords¶

Improve recognition of technical terms and names:

[stt.advanced]
hotwords = "Dictare, OpenVIP, Claude Code, Codex, pytest"

Text-to-Speech (TTS)¶

Available Engines¶

Engine	Platform	Quality	Speed	Notes
`espeak`	Any	Basic	Instant	Built-in, no download
`say`	macOS	Good	Instant	Uses macOS system voices
`piper`	Any	Good	Fast	ONNX-based, many voices
`kokoro`	Any	High	Moderate	Neural, natural-sounding
`outetts`	Any	High	Slower	Neural TTS

Configuration¶

Your default engine is set in config.toml:

[tts]
engine = "say"       # Your preferred engine (always loaded, fastest)
language = "en"      # Voice accent
voice = ""           # Engine-specific speaker name

Using Multiple Engines¶

You can use any installed engine on the fly — even if it's not your default. Pass --engine to dictare speak:

dictare speak "Build complete"                         # Uses default engine
dictare speak "Test passed" --engine kokoro             # Uses kokoro for this request
dictare speak "Deploying" --engine piper                # Uses piper for this request

The first time you use a non-default engine, it takes a moment to load. After that, the audio is cached — same text + engine + language + voice = instant playback from cache.

dictare speak --list-engines     # Show all available engines
dictare speak --list-voices      # Show voices for current engine

TTS Worker Isolation¶

Kokoro, Piper, and OuteTTS engines run in an isolated subprocess worker. This prevents their dependencies from interfering with the main Dictare process. The worker is managed automatically.

Performance Tips¶

macOS Apple Silicon: Use large-v3-turbo with MLX for the best speed/accuracy balance
Linux with NVIDIA GPU: Use large-v3-turbo with CUDA (device = "cuda")
CPU-only machines: Use small or parakeet-v3 for reasonable speed
Low memory: Use tiny or base; set compute_type = "int8" to reduce memory usage
Hotwords: Always set hotwords for domain-specific terms you use frequently