Overview¶
Dictare is an open-source voice layer that brings voice interaction to any application — with a focus on AI coding agents.
We believe voice changes the paradigm. When you can speak to your coding agent instead of typing, you unlock a level of efficiency that wasn't possible before. You explain complex ideas faster. You iterate without switching context. You keep your hands on the keyboard for code while your voice drives the conversation.
The problem¶
Modern speech recognition and synthesis models are powerful but heavy. Every application that wants voice interaction has to download models, manage GPU resources, and build its own speech pipeline. For most tools, that's not worth the effort — so they don't.
The solution¶
Dictare runs as a background service. STT and TTS engines are loaded once at startup, optimized for your hardware, and stay ready in memory. Any application can use them instantly through the OpenVIP protocol — no model loading, no ML dependencies, no GPU management.
Speak, and your words reach the right application. No window focus required.
Designed for coding agents¶
Dictare is coding-agent-first. Out of the box, it works with Claude Code, Codex, Gemini CLI, Aider, and Pi. But the protocol is open — any tool can connect.
Key principles¶
- 100% local — all processing happens on your machine. No audio leaves your computer. No cloud, no API keys, no subscription.
- Always ready — models are preloaded. Zero cold-start latency.
- No focus required — your voice reaches the agent even when its window is in the background.
- Open protocol — OpenVIP is an open standard. Anyone can build a client.
- Open source — MIT licensed. Free forever.
How it works¶
Microphone
│
▼
STT Engine Whisper (MLX / CTranslate2) or Parakeet (ONNX)
│ loaded once, always ready
▼
Pipeline submit detection, mute control, agent switching
│
▼
OpenVIP HTTP / SSE — open protocol
│
▼
Application coding agent, CLI tool, or any OpenVIP client
Next steps¶
- Getting Started — install and run your first voice session
- Basic Usage — voice commands, hotkey actions, and the status bar
- Agents — multi-agent setup and custom agent profiles
- OpenVIP Protocol — build your own integration