Query Transformation & Expansion

The single biggest weakness of naive RAG: the user's question is rarely phrased like the relevant document. Query transformation bridges this gap.

The Vocabulary Mismatch Problem

User query: "Why does my API keep returning 429?" Relevant document: "Rate limiting enforces request quotas to prevent service abuse."

The embedding similarity between these is surprisingly low — different vocabulary, same meaning. Simple RAG fails here.

Query Rewriting

Use an LLM to rewrite the user's query to be more retrieval-friendly:

Original: "Can you help with my issue logging into the system?" Rewritten: "Authentication failure troubleshooting, login error resolution, password reset process"

Rewriting expands vocabulary and removes conversational filler that doesn't match document language.

Hypothetical Document Embedding (HyDE)

Key insight: A hypothetical answer to the question is more likely to match real documents than the question itself.

Use an LLM to generate a hypothetical answer to the query
Embed the hypothetical answer (not the original query)
Use that embedding for retrieval

Why it works: The hypothetical answer shares vocabulary and structure with real documents, leading to better semantic match.

Query: "What are the limits of RAG?"
↓
HyDE: "RAG has several limitations: it depends on retrieval quality,
       has latency overhead, struggles with multi-hop reasoning..."
↓
Embed "HyDE response" → retrieve similar documents

Multi-Query Retrieval

A single query may miss relevant documents due to perspective or phrasing. Generate multiple queries and merge results:

Original: "How do transformers handle long documents?"
↓
Generated queries:
1. "Transformer context window length limitations"
2. "Long document processing with attention"
3. "Memory efficient transformers for long sequences"
↓
Retrieve for each → deduplicate → merge → rank

Step-Back Prompting

For complex queries, first abstract to a higher-level question, then retrieve:

Step back: "What general principles explain this specific question?"
Retrieve with the general question (broader, more documents match)
Combine: Use both specific and general context to answer

Works especially well for multi-hop reasoning where the answer requires synthesizing concepts.

Query Decomposition

For complex multi-part questions, decompose into simpler sub-questions:

Complex: "Compare RAG and fine-tuning in terms of cost, quality, and update speed"
↓
Sub-questions:
1. "What is the cost of RAG vs fine-tuning?"
2. "How does retrieval quality compare to fine-tuned model quality?"
3. "How quickly can RAG vs fine-tuning incorporate new information?"
↓
Retrieve and answer each → synthesize final answer