Basic Usage¶
Launching an Agent¶
Start an agent with voice support:
dictare agent freddie
This launches the default coding agent (Claude Code) in a dictare-enabled session with a status bar at the bottom showing the current state. The name "freddie" is your session name — pick whatever you like.
To use a different coding agent, specify the type:
dictare agent freddie --profile claude # Claude Code (default)
dictare agent freddie --profile codex # OpenAI Codex
dictare agent freddie --profile gemini # Google Gemini CLI
dictare agent freddie --profile aider # Aider
dictare agent freddie --profile pi # Pi
Options¶
dictare agent freddie --verbose # Show debug output (-v)
dictare agent freddie --no-status-bar # Hide the status bar (pure pass-through)
dictare agent freddie --continue # Continue previous session
dictare agent freddie --live-dangerously # Skip all agent permission prompts (YOLO mode)
The --live-dangerously flag is a convenience that tells the coding agent to skip interactive permission prompts. It maps to --dangerously-skip-permissions for Claude Code, --dangerously-bypass-approvals-and-sandbox for Codex, --yolo for Gemini, etc.
Hotkey Actions¶
| Action | What it does |
|---|---|
| Single tap | Enable/disable microphone (status bar: green / gray) |
| Double-tap | Submit the current input (like pressing Enter) |
| Right Alt + hotkey | Toggle output mode (agents/keyboard) |
Recording¶
When the microphone is enabled (status bar green), speak naturally. Dictare uses Voice Activity Detection (VAD) to detect speech boundaries automatically.
Recording stops when:
- Silence exceeds the threshold (default: 850ms)
- Maximum duration is reached (default: 60s)
The transcription is delivered to the focused agent. If you speak for longer than 60 seconds, dictare simply sends what you've said so far and keeps listening — you can speak continuously without interruption.
Submitting¶
As you speak, transcriptions are injected into the coding agent as if typed on the keyboard — the text accumulates in the agent's input. When you're happy with what you've said, double-tap the hotkey to submit (equivalent to pressing Enter).
You can also press Enter directly if the agent's terminal has focus, but double-tap is the recommended workflow — it works regardless of which window is focused.
Voice Commands¶
Dictare recognizes voice commands embedded in your speech. This lets you control everything hands-free — submit code, mute the mic, or switch agents while away from the keyboard, walking around, without using your hands.
All trigger words are fully configurable, but the defaults have been chosen empirically to minimize false triggers during normal speech.
Submit¶
Say "OK send" or "OK submit" at the end of your dictation to submit the input without touching the keyboard.
Mute / Unmute¶
Say "OK mute" or "mate, hold on" to mute the microphone, "OK listen" or "buddy, listen up" to unmute. The status bar updates to show the muted state.
Agent Switching¶
In a multi-agent setup, say "agent john" or "agent ringo" to switch which agent receives your voice input. The name matches the session name you used when launching the agent.
Status Bar¶
When running via dictare agent, a status bar appears at the bottom of the terminal:
- listening (green) — microphone enabled, waiting for speech
- recording (red) — actively recording your voice
- muted — voice commands only; engine is listening for reactivation triggers ("OK listen")
- off (gray) — microphone disabled
- disconnected — engine is not running
Web Dashboard¶
Dictare includes a web-based settings dashboard at:
http://localhost:8770/ui
Use it to adjust audio, STT, TTS, hotkey, and pipeline settings without editing config files manually.
Output Modes¶
Dictare supports two output modes:
- agents (default) — delivers transcriptions to connected agents via the OpenVIP protocol
- keyboard — types transcriptions directly as keyboard input (for non-OpenVIP-enabled applications)
Toggle between modes with Right Alt + hotkey, or set it in config:
[output]
mode = "agents" # or "keyboard"
Audio Feedback¶
Dictare plays subtle audio cues for events like recording start/stop, transcription received, and submit. Sounds are focus-gated by default: they stay silent when your agent's terminal is focused (so they don't interrupt your flow) and play when the terminal is in the background, so you know what's happening without looking.
Volume is configurable per event in config.toml:
[audio.sounds.start]
enabled = true
volume = 0.3
focus_gated = false