Lesson 1 of 10% complete
Lesson 1
+150 XPQLoRA: Fine-Tuning on Consumer Hardware
QLoRA: Fine-Tuning on Consumer Hardware
QLoRA (Quantized LoRA) extends LoRA to work with 4-bit quantized base models, making fine-tuning possible on a single consumer GPU.
How QLoRA Works
- 4-bit NormalFloat (NF4): Quantize base model weights to 4-bit using information-theoretically optimal representation
- Double Quantization: Quantize the quantization constants themselves (saves ~0.37 bits/parameter)
- Paged Optimizers: Use NVIDIA unified memory to handle memory spikes during training
- LoRA on top: Only train LoRA adapters in full precision (bfloat16)
Memory Requirements
| Model | Full FT | LoRA | QLoRA |
|---|---|---|---|
| 7B | 112 GB | 48 GB | 6 GB |
| 13B | 208 GB | 88 GB | 10 GB |
| 70B | 1120 GB | 360 GB | 48 GB |
Code Sandbox
Python 3.11
Simulated Runtime
sandbox.py
python