Course Content
Module 7 of 9
Module 7: Fine-Tuning & Alignment
0/3
0% complete1. Fine-Tuning Decision Framework & Dataset Creation+135 XP
2. PEFT Methods: LoRA, QLoRA & Quantization+130 XP
3. Alignment: RLHF, DPO, Constitutional AI & Evaluation+135 XP
Module 7/9 · Lesson 1/3
Fine-Tuning Decision Framework & Dataset Creation
genai
advanced
+135 XP
Fine-Tuning Decision Framework & Dataset Creation
The Three-Way Choice
Before writing a single line of training code, understand the full option space:
| Approach | When to Use | Cost | Latency | Data Required |
|---|---|---|---|---|
| Prompt Engineering | General tasks, fast iteration | $ | Low | None |
| RAG | Knowledge-intensive, facts change frequently | $$ | Medium | Document corpus |
| Fine-Tuning | Style/format consistency, domain vocabulary, max performance | $$$ | Low (short prompts) | 500–50K labeled examples |
| RAG + Fine-Tuning | Complex domain with changing knowledge | $$$$ | Medium | Both |
Decision Checklist
Before fine-tuning, verify ALL of the following:
- Prompt engineering has plateaued — You've tried chain-of-thought, few-shot (10+ examples), system prompts, and output format specifications.
- Data exists — You have at least 500 high-quality labeled examples (or can create them).
- Task is well-defined — The input-output mapping is consistent and unambiguous.
- Evaluation is ready — You can measure success automatically (not just "vibes").
- Budget allows — GPU time, labeling cost, and ongoing maintenance are acceptable.
- Knowledge is stable — If the domain changes weekly, RAG may be better.
Fine-Tuning Objectives
| Objective | Description | Example |
|---|---|---|
| Instruction Following | Teach model to follow a format | JSON output, structured reports |
| Domain Adaptation | Inject domain vocabulary/knowledge | Medical, legal, finance |
| Style Transfer | Match a specific writing style | Brand voice, tone of voice |
| Task Specialization | Optimize for one task | SQL generation, code review |
| Safety Alignment | Reduce harmful outputs | Constitutional AI, RLAIF |
| Distillation | Transfer large model knowledge to small | Teacher-student training |
Catastrophic Forgetting
Full fine-tuning updates all weights — the model can forget general capabilities it learned during pre-training. Mitigations:
- PEFT (train <1% of weights) — Almost eliminates forgetting
- Replay — Mix original pre-training data into fine-tuning batches
- EWC (Elastic Weight Consolidation) — Penalize changes to weights important for old tasks
- Lower learning rate — 1e-5 vs 1e-3 reduces magnitude of weight changes
Compute Requirements
| Model Size | Full FT (FP32) | Full FT (BF16) | QLoRA (4-bit) |
|---|---|---|---|
| 7B | 112 GB | 56 GB | ~12 GB |
| 13B | 208 GB | 104 GB | ~20 GB |
| 70B | ~1.1 TB | ~560 GB | ~48 GB |
| 405B | — | — | ~280 GB |
Dataset Creation: The LIMA Principle
LIMA (Less Is More for Alignment, Zhou et al., 2023): 1,000 carefully curated examples outperformed models trained on 52,000+ examples. Quality > Quantity.
Key insight: The model learns how to respond from fine-tuning data, but what to respond relies on pre-trained knowledge. Focus on format, style, and structure — not cramming facts.
Dataset Formats
// Alpaca-style (instruction + optional input + output)
{
"instruction": "Summarize the key findings",
"input": "The study examined 500 patients...",
"output": "Key findings: (1) 73% showed improvement..."
}
// ShareGPT-style (multi-turn conversation)
{
"conversations": [
{"from": "human", "value": "Explain gradient descent"},
{"from": "gpt", "value": "Gradient descent is an optimization..."},
{"from": "human", "value": "What about momentum?"},
{"from": "gpt", "value": "Momentum builds on gradient descent by..."}
]
}
Data Sourcing Strategies
| Strategy | Description | Quality | Scale | Cost |
|---|---|---|---|---|
| Human annotation | Expert contractors write examples | ★★★★★ | Low | High |
| Web scraping | Filter high-quality web text | ★★★ | High | Low |
| Self-Instruct | LLM generates instruction-response pairs | ★★★★ | High | Medium |
| Evol-Instruct | LLM rewrites simple instructions into harder ones | ★★★★ | High | Medium |
| Synthetic from teacher | GPT-4/Claude generates training data | ★★★★ | High | Medium |
| Existing datasets | Alpaca, FLAN, Dolly, OpenHermes | ★★★ | High | Free |
Quality Filters (Non-Negotiable)
Before training, filter your dataset:
- Length filter — Remove too-short (<50 tokens) or too-long (>4096 tokens) examples
- Deduplication — MinHash or embedding-based similarity (removes up to 30% of typical datasets)
- Quality classifier — Train a classifier on human-labeled "good vs bad" examples
- Toxicity filter — Remove harmful content (Perspective API, Llama Guard)
- Format validation — Ensure consistent structure (check JSON validity, etc.)
- Human review — Sample 5% randomly and manually inspect