Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
2020
Large pre-trained language models have been shown to store factual knowledge implicitly in their parameters. However, this knowledge is static, opaque, and can be factually incorrect. We propose RAG — Retrieval-Augmented Generation — which combines parametric and non-parametric memory for language generation tasks.
Reading Level:
RAG: Retrieval-Augmented Generation
RAG combines a parametric memory (pre-trained LLM) with non-parametric memory (retrieved documents) through a differentiable retrieval mechanism.
Architecture
RAG-Sequence Model: $$p_{\text{RAG-Seq}}(y|x) \approx \sum_{z \in \text{top-}k} p_\eta(z|x) \prod_i^N p_\theta(y_i | x, z, y_{1:i-1})$$
RAG-Token Model: $$p_{\text{RAG-Token}}(y|x) \approx \prod_i^N \sum_{z \in \text{top-}k} p_\eta(z_i|x,y_{1:i-1}) \cdot p_\theta(y_i|x,z_i,y_{1:i-1})$$
Retriever: Dense Passage Retrieval (DPR)
Uses bi-encoder architecture:
- Query encoder: $E_Q(x)$ (BERT-base)
- Document encoder: $E_D(z)$ (BERT-base, separate weights)
- Similarity: $p_\eta(z|x) \propto \exp(E_Q(x)^T E_D(z))$
- FAISS index for approximate nearest-neighbor search over 21M Wikipedia passages
Generator: BART
seq2seq generator conditioned on retrieved passages: $$p_\theta(y_i | x, z, y_{1:i-1}) = \text{BART}([x; z]; \theta)$$
Key Results
- Open MS-MARCO: 45.5 → 56.8 (EM)
- Natural Questions: 44.5 SOTA
- TriviaQA: 68.0 SOTA
- End-to-end trainable — both retriever and generator updated jointly
XP Reward
+175
Earned after 10 seconds of reading
Key EquationsClick for Python code
Citation Graph
References (3)
Dense Passage Retrieval for Open-Domain Question Answering
Karpukhin et al. · 2020BART: Denoising Sequence-to-Sequence Pre-training
Lewis et al. · 2020FAISS: A library for efficient similarity search
Johnson et al. · 2019