Pipeline

Dictare processes transcriptions through a pipeline of filters and executors before delivering text to agents. The pipeline enables voice commands like "OK send", "OK mute", and "agent claude" to be intercepted and acted upon.

Architecture

The pipeline has two stages:

  1. Filters — analyze and modify transcriptions, extract commands
  2. Executors — execute the actions detected by filters

Filters and executors are loaded via PipelineLoader using dependency injection.

Filters

Submit Filter

Detects submit triggers at the end of a transcription and sends the input to the agent.

[pipeline.submit_filter]
triggers = ["send", "submit", "enter"]
confidence_threshold = 0.8

When you say "OK send" or "OK submit" at the end of your dictation, the filter strips the trigger phrase and submits the remaining text.

The confidence_threshold (0.0 to 1.0) controls how strictly the trigger must match. Lower values are more lenient, higher values reduce false positives.

Mute Filter

Detects mute/unmute commands to control the microphone.

[pipeline.mute_filter]
triggers = ["mute", "listen"]
phrases = ["OK mute", "OK listen"]

Say "OK mute" to silence the microphone, "OK listen" to resume.

Agent Filter

Detects agent switching commands in multi-agent setups.

[pipeline.agent_filter]
triggers = ["agent"]
match_threshold = 0.7

Say "agent claude" or "agent codex" to switch which agent receives input. The match_threshold controls fuzzy matching tolerance for agent names.

Executors

Executors are the action handlers that run after filters extract commands:

Executor Triggered by Action
InputExecutor Submit filter Delivers text + submit keystroke to agent
MuteExecutor Mute filter Toggles microphone state
AgentSwitchExecutor Agent filter Switches active agent

Pattern Matching

Filters share a common text matching module (pipeline/filters/_text.py) that handles:

  • Case-insensitive matching
  • Fuzzy matching with configurable thresholds
  • Trigger phrase extraction and stripping
  • Position-aware matching (triggers at end of text for submit)

How It Works

  1. You speak: "Add a test for the login function OK send"
  2. STT transcribes the audio
  3. Submit filter detects "OK send" at the end
  4. Filter strips "OK send", passes "Add a test for the login function" forward
  5. InputExecutor delivers the text to the active agent and presses Enter

Customizing Triggers

You can customize the trigger phrases to match your speaking style:

[pipeline.submit_filter]
triggers = ["go", "send", "submit", "do it"]

[pipeline.mute_filter]
triggers = ["mute", "listen", "quiet", "resume"]
phrases = ["OK quiet", "OK resume", "OK mute", "OK listen"]

Disabling Filters

To disable a filter, set its triggers to an empty list:

[pipeline.submit_filter]
triggers = []