Featured article
Building an Intent Engine with LLMs: Lessons from Instacart
Source & attribution
This article is a Lussent-authored summary and commentary on “Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs” by Yuanzheng Zhu and colleagues at Instacart Engineering.
Why query understanding matters so much
Instacart’s search box has to understand messy, shorthand queries like “bread no gluten” or “x large zip lock” and still return the right products. The team treats this layer as an “intent engine”: if it misreads what you meant, every downstream system suffers. Traditional machine learning models handled common, well–structured searches, but they struggled with sparse data, vague queries, and the long tail of highly specific requests.
Over time, Instacart accumulated separate models for things like query classification and query rewrites. Each had its own data pipeline and infra, making the overall system complex and slow to evolve.
Why LLMs are a better backbone
Large language models offer something the legacy stack didn’t: broad world knowledge and strong reasoning. A model that already knows the relationship between “Italian parsley”, “flat parsley” and “curly parsley” needs far less bespoke feature engineering to perform well on grocery search.
Instacart’s strategy is to use LLMs as a unified backbone instead of a collection of narrow models. They focus on enriching the model with Instacart–specific context and then compressing that knowledge into smaller, efficient models for real-time use.
The three–layer strategy for LLM–powered intent
- Context-engineering with RAG. Data pipelines retrieve Instacart-specific information (conversion history, catalog details, taxonomies) and inject it into prompts. This grounds the LLM’s answers in live business reality instead of generic web knowledge.
- Post-processing guardrails. Validation layers check that the model’s outputs align with Instacart’s product taxonomy and filters out obvious hallucinations or off-topic suggestions.
- Fine-tuning for deep expertise. For the hardest problems, they fine-tune smaller open-source models on proprietary data, baking domain knowledge directly into the weights.
Use case 1: Better query category classification
Instacart’s catalog is a deep hierarchy, from departments like “Meat” down to specific leaf categories. Putting each query into the right path is critical: it drives recall and ranking. The legacy approach treated this as a huge flat classification problem trained on noisy conversion data, which often mis–classified nuanced or new queries.
The new system first retrieves top candidate categories from historical conversions, then uses an LLM — enriched with Instacart context — to re-rank them. A final semantic similarity check between the original query and the predicted category path filters out mismatches. This three-step flow significantly improves precision for difficult queries.
Use case 2: Structured query rewrites
Query rewrites help when the original search would return weak results. Earlier, Instacart mined rewrites from user sessions, which covered only about half of search traffic. A naive “generate some rewrites” prompt with a single LLM also turned out to be too vague: it produced synonyms that weren’t helpful for discovery.
The team redesigned this as three specialized rewrite types — substitutes, broader queries, and synonyms — each with tailored prompts, instructions, and few-shot examples. Guardrails enforce semantic relevance. With this structured approach, rewrite coverage jumps to over 95% while maintaining high precision.
Use case 3: Semantic Role Labeling with a teacher–student setup
Semantic Role Labeling (SRL) extracts structured tags from a query — like product, brand, and attributes — which power retrieval, ranking, ads, and filters. The catch: it’s impossible to precompute tags for every long-tail query without blowing up cost.
Instacart built a hybrid system:
- An offline “teacher” pipeline uses RAG-style context enrichment plus a powerful LLM to produce high-quality tags and populate a cache for head queries.
- A real-time “student” model, a smaller fine-tuned language model (e.g., Llama-3-8B), serves tail queries at low latency, trained on the teacher’s labeled data.
This setup gives Instacart near-frontier quality for almost any query, while keeping costs and latency in check.
Fine-tuning and production performance
The fine-tuned 8B “student” model reaches nearly the same F1 score as a much larger foundation model, with slightly higher precision and similar overall accuracy. That turns the fine-tuning pipeline into a reusable pattern for other LLM-powered systems at Instacart.
Serving that model in production required serious optimization. By merging LoRA adapter weights back into the base model, upgrading GPUs, exploring quantization, and adding GPU autoscaling, the team brought latency down from ~700ms to the low 300ms range while managing cost. Only a small fraction of traffic needs real-time inference thanks to caching.
Impact on search quality
With the new SRL-powered tagging for tail queries, Instacart reports that users find items faster — average scroll depth drops by about 6% — and customer complaints about poor results on obscure queries are cut roughly in half. The system now serves millions of cold-start queries every week.
Key lessons for enterprise LLM systems
- Context is the real moat. General-purpose LLMs are becoming commodities; proprietary context — engagement data, catalog metadata, operational constraints — is where defensibility lives.
- Start offline, then move to real-time. Prove value and generate high-quality labels with offline pipelines before investing in low-latency inference.
- Simplify the stack. A single LLM backbone can often replace a zoo of task-specific models, reducing maintenance overhead.
- Model quality isn’t enough. You also need caching, autoscaling, latency tuning, and guardrails before an LLM system delivers consistent value in production.
How this maps to Lussent’s world
For Lussent, this architecture is a strong blueprint for any agentic AI workflow that needs deep intent understanding — whether that’s enterprise support, document search, or vertical marketplaces. The pattern is clear: enrich a powerful model with domain context, create an offline “teacher” pipeline, distill that into smaller models for real-time use, and surround everything with guardrails and observability.