microsoft/LoRAPlease refer to the latest official releases for information GitHub Homepage

Microsoft's open-source LoRA library, implementing low-rank adaptation techniques for large language models, significantly reducing training parameters during model fine-tuning.

MITPython 12.1kmicrosoftLoRA Last Updated: 2024-12-17

Microsoft LoRA Project Detailed Introduction

Project Overview

Microsoft LoRA is an open-source Python library from Microsoft that implements "LoRA: Low-Rank Adaptation of Large Language Models" technology. This project provides a revolutionary solution for the efficient fine-tuning of large language models.

Project Address: https://github.com/microsoft/LoRA

Paper Address: https://arxiv.org/abs/2106.09685

Core Technical Principles

Introduction to LoRA Technology

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method. Its core idea is:

Freeze Original Pre-trained Weights: Do not modify the original model parameters.
Add Low-Rank Decomposition Matrices: Achieve model adaptation by learning a pair of low-rank matrices.
Significantly Reduce Training Parameters: Only train the newly added low-rank matrix parameters.

Technical Advantages

Extremely High Parameter Efficiency

Compared to full fine-tuning of GPT-3 175B, LoRA can reduce training parameters by 10,000 times.
GPU memory requirements are reduced by 3 times.

Excellent Performance Retention

Performs comparably or better than full fine-tuning on RoBERTa, DeBERTa, GPT-2, and GPT-3.
Achieves excellent results in the GLUE benchmark.

Deployment Friendly

No additional inference latency.
Supports efficient task switching.
Significantly reduced storage requirements.

Performance

GLUE Benchmark Results

LoRA demonstrates excellent performance in the GLUE benchmark:

Model	Number of Training Parameters	MNLI Accuracy	SST-2 Accuracy	MRPC Accuracy
RoBERTa Base Full Fine-tuning	125M	87.6	94.8	90.2
RoBERTa Base LoRA	0.8M	87.6±.1	95.1±.2	89.7±.7
DeBERTa XXL Full Fine-tuning	1.5B	91.1	96.8	92.9
DeBERTa XXL LoRA	4.7M	91.9±.1	96.9±.2	92.6±.6

GPT-2 Text Generation Tasks

LoRA also performs excellently on text generation tasks such as E2E, DART, and WebNLG:

Method	Training Parameters	E2E (BLEU)	DART (BLEU)	WebNLG (BLEU)
GPT-2 M Full Fine-tuning	354.92M	68.2	46.0	47.6
GPT-2 M LoRA	0.35M	70.4±.1	47.1±.2	55.3±.2

Project Structure

microsoft/LoRA/
├── loralib/           # Core library source code
├── examples/
│   ├── NLG/          # GPT-2 Natural Language Generation Example
│   └── NLU/          # RoBERTa/DeBERTa Natural Language Understanding Example
├── README.md
└── setup.py

Installation and Usage

Installation Method

pip install loralib

# Or install from source
pip install git+https://github.com/microsoft/LoRA

Basic Usage Example

1. Replace Linear Layers

# ===== Before Modification =====
# layer = nn.Linear(in_features, out_features)

# ===== After Modification =====
import loralib as lora
layer = lora.Linear(in_features, out_features, r=16)

2. Mark Trainable Parameters

import loralib as lora

model = BigModel()
# Only set parameters containing "lora_" as trainable
lora.mark_only_lora_as_trainable(model)

# Training loop
for batch in dataloader:
    # Normal training process
    pass

3. Save and Load Checkpoints

# Save LoRA parameters
torch.save(lora.lora_state_dict(model), 'lora_checkpoint.pt')

# Load pre-trained model
model.load_state_dict(torch.load('pretrained.pt'), strict=False)
# Load LoRA parameters
model.load_state_dict(torch.load('lora_checkpoint.pt'), strict=False)

Supported Layer Types

Currently, the LoRA library supports the following layer types:

nn.Linear → lora.Linear
nn.Embedding → lora.Embedding
nn.Conv2d → lora.Conv2d
lora.MergedLinear (for merged linear layers such as QKV projections)

Advanced Features

1. Merged Linear Layer Support

# For scenarios such as QKV projections
qkv_proj = lora.MergedLinear(
    d_model, 3*d_model, 
    r=8, 
    enable_lora=[True, False, True]  # Apply LoRA only to Q and V
)

2. Bias Vector Training

# Train LoRA-related biases
lora.mark_only_lora_as_trainable(model, bias='lora_only')

# Train all biases
lora.mark_only_lora_as_trainable(model, bias='all')

3. Weight Merging at Inference Time

# Evaluation mode: merge weights, eliminate inference latency
model.eval()

# Training mode: restore separated state
model.train()

Application Scenarios

Large Model Fine-tuning: Significantly reduce fine-tuning costs.
Multi-task Learning: Efficient task switching.
Resource-Constrained Environments: Reduce memory and storage requirements.
Rapid Prototyping: Accelerate model adaptation process.

Ecosystem Integration

Hugging Face PEFT: Now integrated into HF's PEFT library.
PyTorch Ecosystem: Fully compatible with the PyTorch framework.
Pre-trained Models: Supports various mainstream pre-trained models.

Summary

The Microsoft LoRA project provides a breakthrough solution for the efficient fine-tuning of large language models. Through clever low-rank adaptation technology, it maintains excellent model performance while significantly reducing computational and storage costs. This project not only has important academic value but also provides a feasible technical path for practical industrial applications, and is a milestone work in the field of parameter-efficient fine-tuning.