Module 6: Advanced RAG
Query Transformation & Expansion
Query Transformation & Expansion
The single biggest weakness of naive RAG: the user's question is rarely phrased like the relevant document. Query transformation bridges this gap.
The Vocabulary Mismatch Problem
User query: "Why does my API keep returning 429?" Relevant document: "Rate limiting enforces request quotas to prevent service abuse."
The embedding similarity between these is surprisingly low — different vocabulary, same meaning. Simple RAG fails here.
Query Rewriting
Use an LLM to rewrite the user's query to be more retrieval-friendly:
Original: "Can you help with my issue logging into the system?" Rewritten: "Authentication failure troubleshooting, login error resolution, password reset process"
Rewriting expands vocabulary and removes conversational filler that doesn't match document language.
Hypothetical Document Embedding (HyDE)
Key insight: A hypothetical answer to the question is more likely to match real documents than the question itself.
- Use an LLM to generate a hypothetical answer to the query
- Embed the hypothetical answer (not the original query)
- Use that embedding for retrieval
Why it works: The hypothetical answer shares vocabulary and structure with real documents, leading to better semantic match.
Query: "What are the limits of RAG?"
↓
HyDE: "RAG has several limitations: it depends on retrieval quality,
has latency overhead, struggles with multi-hop reasoning..."
↓
Embed "HyDE response" → retrieve similar documents
Multi-Query Retrieval
A single query may miss relevant documents due to perspective or phrasing. Generate multiple queries and merge results:
Original: "How do transformers handle long documents?"
↓
Generated queries:
1. "Transformer context window length limitations"
2. "Long document processing with attention"
3. "Memory efficient transformers for long sequences"
↓
Retrieve for each → deduplicate → merge → rank
Step-Back Prompting
For complex queries, first abstract to a higher-level question, then retrieve:
- Step back: "What general principles explain this specific question?"
- Retrieve with the general question (broader, more documents match)
- Combine: Use both specific and general context to answer
Works especially well for multi-hop reasoning where the answer requires synthesizing concepts.
Query Decomposition
For complex multi-part questions, decompose into simpler sub-questions:
Complex: "Compare RAG and fine-tuning in terms of cost, quality, and update speed"
↓
Sub-questions:
1. "What is the cost of RAG vs fine-tuning?"
2. "How does retrieval quality compare to fine-tuned model quality?"
3. "How quickly can RAG vs fine-tuning incorporate new information?"
↓
Retrieve and answer each → synthesize final answer