AIMLSAGA

fine-tuning

parameter-efficient

lora

adaptation

llm

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis et al.

2022

We propose Low-Rank Adaptation (LoRA), which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.

Reading Level:

LoRA: Low-Rank Adaptation

Core Hypothesis

For pre-trained over-parameterized models, weight updates during fine-tuning have a low intrinsic rank. This motivates learning rank decomposition matrices instead of full weight updates.

Method

For a pre-trained weight matrix $W_0 \in \mathbb{R}^{d \times k}$, constrain the update:

$$W_0 + \Delta W = W_0 + BA$$

Where $B \in \mathbb{R}^{d \times r}$, $A \in \mathbb{R}^{r \times k}$, and $r \ll \min(d, k)$.

The forward pass becomes: $$h = W_0 x + \Delta W x = W_0 x + BAx$$

Initialization

$A$ initialized with random Gaussian
$B$ initialized to zero (so $\Delta W = 0$ at start)
Scaling: $\frac{\alpha}{r}$ applied to $\Delta W$

Parameter Efficiency

For GPT-3 with $r=4$:

Full fine-tuning: 175B parameters
LoRA trainable params: ~4.7M (0.003% of original)
Performance: Matches or exceeds full fine-tuning on GLUE, WikiSQL, SAMSum

Practical Considerations

Applied to $W_q$ and $W_v$ in attention layers only (empirically sufficient)
Multiple LoRA adapters can be combined at inference
No inference latency: $W = W_0 + BA$ merged before deployment

What did this paper leave unanswered?

Join the gap discussion — identify research opportunities and connect with others building solutions.

XP Reward

+175

Earned after 10 seconds of reading

Key EquationsClick for Python code

LoRA low-rank weight decomposition — the core mathematical insight

W = W_0 + \Delta W = W_0 + BA, \quad B \in \mathbb{R}^{d \times r}, A \in \mathbb{R}^{r \times k}, r \ll \min(d,k)

Citation Graph

References (3)

Language Models are Few-Shot Learners (GPT-3)

Brown et al. · 2020

Parameter-Efficient Transfer Learning for NLP

Houlsby et al. · 2019

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers et al. · 2023