Course Content
Module 5 of 9
Module 5: Core RAG
0/3
0% complete1. The RAG Architecture+100 XP
2. Chunking Strategies+100 XP
3. RAG Evaluation with RAGAS+100 XP
Module 5/9 · Lesson 1/3
The RAG Architecture
genai
intermediate
+100 XP
The RAG Architecture
"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" — Lewis et al., 2020 (Meta AI)
RAG solves a fundamental problem: LLMs have a knowledge cutoff and can hallucinate facts. By pairing them with a retrieval system, we ground responses in real, verifiable, up-to-date documents.
The Core Problem: LLM Hallucination
LLMs generate plausible-sounding text — but plausibility ≠ accuracy. Without grounding, a model confidently fabricates:
- Wrong citations and statistics
- Outdated facts
- Non-existent APIs and functions
RAG Architecture Overview
User Query
↓
[Embedding Model] → query vector
↓
[Vector DB] → top-k relevant chunks
↓
[Context Construction] → system prompt + retrieved docs + query
↓
[LLM] → grounded response
↓
Answer (with source citations)
Two distinct phases:
Indexing (offline, once)
- Load documents (PDF, HTML, markdown, databases)
- Chunk documents into segments
- Embed each chunk with an embedding model
- Store vectors + metadata in a vector database
Retrieval + Generation (online, per query)
- Embed the user query
- Search the vector DB for top-k similar chunks
- Construct a prompt with retrieved context
- Call the LLM with context
- Return the grounded response
Why RAG Beats Pure Fine-Tuning for Knowledge
| Approach | Knowledge Update | Cost | Hallucination Risk |
|---|---|---|---|
| RAG | Add docs to DB (minutes) | Low | Low (grounded) |
| Fine-tuning | Retrain model (hours-days) | High | Medium |
| In-context (no retrieval) | Paste docs in prompt | Medium | High |
RAG is especially powerful for:
- Private/proprietary knowledge (internal docs, customer data)
- Frequently updated information (news, documentation, pricing)
- Long-tail queries where the LLM's training data is sparse
The Faithfulness Challenge
RAG reduces but doesn't eliminate hallucination. The model may:
- Ignore retrieved context and generate from memory
- Contradict the retrieved context
- Misinterpret retrieved passages
Key metrics:
- Faithfulness: Is the answer entailed by the retrieved context? (vs. hallucinated)
- Answer Relevance: Does the answer actually address the question?
- Context Relevance: Are the retrieved chunks actually relevant?
The RAGAS framework provides automated metrics for all three.