llm
open-source
rlhf
llama
fine-tuning
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone et al.
2023
We develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested.
Reading Level:
Llama 2 Architecture and Training
Llama 2 builds on the original Llama architecture with key improvements:
Architecture Changes
- Grouped-Query Attention (GQA): 70B model uses GQA with 8 key-value heads instead of 70, reducing KV cache memory by 8.75×
- SwiGLU Activation: FFN uses SwiGLU instead of ReLU: $\text{SwiGLU}(x, W, V, b, c) = \text{Swish}(xW + b) \odot (xV + c)$
- RoPE Positional Embeddings: Rotary Position Embeddings for better length generalization
- Context Length: Extended from 2048 to 4096 tokens
RLHF Training Pipeline
- Supervised Fine-Tuning (SFT): 27,540 high-quality instruction samples
- Reward Model Training: Two reward models (helpfulness + safety) trained on 1.4M human preference annotations
- PPO/Rejection Sampling: Iterative refinement using Proximal Policy Optimization
Safety Innovations
- Ghost Attention (GAtt): Condition generation on system prompt throughout conversation
- Safety-helpfulness balance: Red-teaming with 350+ adversarial prompts per category
Benchmark Results
| Model | MMLU | HumanEval | GSM8K | TruthfulQA |
|---|---|---|---|---|
| Llama-2-7B | 45.3 | 12.8 | 14.6 | 33.3 |
| Llama-2-13B | 54.8 | 18.3 | 28.7 | 41.9 |
| Llama-2-70B | 68.9 | 29.9 | 56.8 | 44.9 |
| GPT-3.5 | 70.0 | 48.1 | 57.1 | 47.0 |
XP Reward
+200
Earned after 10 seconds of reading
Key EquationsClick for Python code
Rotary Position Embedding (RoPE) — position-aware query/key encoding
f_{q,k}(x_m, m) = (W_{q,k} x_m) e^{im\theta}
Grouped Query Attention — reduces KV cache memory footprint
GQA(Q, K, V) = \text{Concat}_{g=1}^{G}\text{Attention}(Q_g, K_g, V_g)
Citation Graph
References (4)
Training language models to follow instructions with human feedback
Ouyang et al. · 2022Constitutional AI: Harmlessness from AI Feedback
Bai et al. · 2022Proximal Policy Optimization Algorithms
Schulman et al. · 2017RoFormer: Enhanced Transformer with Rotary Position Embedding
Su et al. · 2021