Advance our real-time voice stack with robust evaluation, analytics, and model insights.
You’ll analyze latency/quality trade-offs, create perceptual metrics, and partner with engineering to optimize speech recognition/synthesis loops for live agent experiences.
Role Overview
- Design offline and live evals for speech latency, stability, and quality.
- Analyze user sessions to uncover friction and drive improvements.
- Develop metrics and dashboards that reflect perceived quality.
- Prototype data pipelines for training/finetuning where needed.
- Collaborate with research & infra to iterate quickly and safely.
Requirements
- 3+ years in DS/ML for speech, audio, or real-time systems.
- Strong Python, data tooling (pandas, numpy), and experimentation rigor.
- Experience defining and validating perceptual metrics.
- Ability to communicate insights clearly to cross-functional partners.
Bonus
- Streaming ASR/TTS experience, WebRTC, or VAD/diarization know-how.
- Familiarity with eval harnesses and human-in-the-loop labeling.
Prefer another role? Back to all openings