Voice AI has a unique observability challenge: a single conversation spans multiple services in real-time.
Our Stack
- Traces: OpenTelemetry with custom span attributes for audio quality and model confidence
- Metrics: Prometheus with Grafana tracking p50/p95/p99 by service and region
- Logs: Structured JSON with conversation ID as correlation key
- Alerts: PagerDuty with custom runbooks per alert type
What We Monitor
- End-to-end conversation latency (target: <500ms)
- Speech-to-text accuracy by language
- LLM response quality via automated evaluation
- TTS naturalness scores from user feedback