OpenVIP Protocol

Dictare is the reference implementation of OpenVIP (Open Voice Interaction Protocol) — an open standard for delivering voice input to applications.

Rationale

Voice interaction should be simple to implement. OpenVIP is intentionally minimal: a small set of HTTP endpoints using Server-Sent Events (SSE) for real-time delivery. The protocol is designed so that even low-power devices can implement it — a Raspberry Pi, an ESP32, or a simple web app.

The full protocol specification is at openvip.dev.

Python SDK

The recommended way to interact with OpenVIP is through the official SDK. It handles connection management, reconnection, message parsing, and required fields automatically.

pip install openvip

Subscribe to Transcriptions

from openvip import Client

client = Client("http://localhost:8770/openvip")

for message in client.subscribe(agent_id="elvis"):
    if message.type == "transcription":
        print(message.text)

Check Engine Status

from openvip import Client

client = Client("http://localhost:8770/openvip")
status = client.status()
print(status.state)  # "listening", "recording", "off", "muted"

Request Speech (TTS)

from openvip import Client

client = Client("http://localhost:8770/openvip")
client.speech(text="Build complete", engine="kokoro")

dictare transcribe

The dictare transcribe command is a ready-made OpenVIP client. It registers as an agent, receives transcriptions, and prints them to stdout.

dictare transcribe                # Accumulate, print on submit, exit
dictare transcribe --auto-submit  # Print first transcription and exit

Pipe-friendly:

dictare transcribe | llm | dictare speak   # Voice → LLM → Speech
dictare transcribe >> voice-notes.txt      # Voice → file
dictare transcribe | python process.py     # Voice → custom processing

Building Custom Integrations

Any application can receive voice input from Dictare by using the OpenVIP SDK.

from openvip import Client

client = Client("http://localhost:8770/openvip")

for message in client.subscribe(agent_id="my-editor-plugin"):
    if message.type == "transcription":
        insert_at_cursor(message.text)

        # Check if the user requested submit
        x_input = getattr(message, "x_input", None)
        if x_input and "submit" in (getattr(x_input, "ops", None) or []):
            execute_command()

Always use the SDK rather than raw HTTP calls. The SDK ensures all required fields (message IDs, timestamps, protocol version) are correctly set. Raw calls risk producing non-compliant messages that the engine will reject.

Endpoints Reference

For reference, these are the OpenVIP endpoints. Use the SDK instead of calling them directly.

Method Path Description
POST /openvip/subscribe Subscribe to events (SSE stream)
POST /openvip/input Send text input
POST /openvip/speech Request TTS
POST /openvip/speech/stop Stop TTS playback
POST /openvip/status Engine status
GET /openvip/openapi.json OpenAPI specification