Help us build, ship, and monitor production agent systems that orchestrate language models, tools, and memory—reliably and at scale.
You’ll design and implement core runtime components for multi-tool, multi-turn agents, improve latency and reliability across our inference pathways, and collaborate with product to turn cutting-edge research into dependable experiences.
Role Overview
- Own critical paths in the agent runtime (tooling, orchestration, memory, evals).
- Ship code across the stack (TypeScript/Node, Python, Postgres/Redis, queues).
- Instrument, monitor, and harden systems for reliability, performance, and cost.
- Create evaluation harnesses (offline & online) to measure quality and regressions.
- Partner with product and research to take features from concept to production.
Requirements
- 5+ years building backend systems at scale (queues, workers, services, APIs).
- Deep experience with TypeScript/Node and/or Python.
- Hands-on with LLM tooling (function/tool calling, RAG, vector DBs, evals).
- Operational mindset: logging, tracing, on-call, SLOs, postmortems.
- Clear communication and product sense; bias to ship.
Bonus
- Experience with OpenAI/Anthropic tool usage, serverless, or WebRTC/voice.
- MLOps or infra-as-code (Terraform) in multi-env setups.
Prefer another role? Back to all openings