Home
Login

Microsoft's open-source LoRA library, implementing low-rank adaptation techniques for large language models, significantly reducing training parameters during model fine-tuning.

MITPython 12.1kmicrosoftLoRA Last Updated: 2024-12-17

Microsoft LoRA Project Detailed Introduction

Project Overview

Microsoft LoRA is an open-source Python library from Microsoft that implements "LoRA: Low-Rank Adaptation of Large Language Models" technology. This project provides a revolutionary solution for the efficient fine-tuning of large language models.

Project Address: https://github.com/microsoft/LoRA

Paper Address: https://arxiv.org/abs/2106.09685

Core Technical Principles

Introduction to LoRA Technology

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method. Its core idea is:

  • Freeze Original Pre-trained Weights: Do not modify the original model parameters.
  • Add Low-Rank Decomposition Matrices: Achieve model adaptation by learning a pair of low-rank matrices.
  • Significantly Reduce Training Parameters: Only train the newly added low-rank matrix parameters.

Technical Advantages

  1. Extremely High Parameter Efficiency
  • Compared to full fine-tuning of GPT-3 175B, LoRA can reduce training parameters by 10,000 times.
  • GPU memory requirements are reduced by 3 times.
  1. Excellent Performance Retention
  • Performs comparably or better than full fine-tuning on RoBERTa, DeBERTa, GPT-2, and GPT-3.
  • Achieves excellent results in the GLUE benchmark.
  1. Deployment Friendly
  • No additional inference latency.
  • Supports efficient task switching.
  • Significantly reduced storage requirements.

Performance

GLUE Benchmark Results

LoRA demonstrates excellent performance in the GLUE benchmark:

Model Number of Training Parameters MNLI Accuracy SST-2 Accuracy MRPC Accuracy
RoBERTa Base Full Fine-tuning 125M 87.6 94.8 90.2
RoBERTa Base LoRA 0.8M 87.6±.1 95.1±.2 89.7±.7
DeBERTa XXL Full Fine-tuning 1.5B 91.1 96.8 92.9
DeBERTa XXL LoRA 4.7M 91.9±.1 96.9±.2 92.6±.6

GPT-2 Text Generation Tasks

LoRA also performs excellently on text generation tasks such as E2E, DART, and WebNLG:

Method Training Parameters E2E (BLEU) DART (BLEU) WebNLG (BLEU)
GPT-2 M Full Fine-tuning 354.92M 68.2 46.0 47.6
GPT-2 M LoRA 0.35M 70.4±.1 47.1±.2 55.3±.2

Project Structure

microsoft/LoRA/
├── loralib/           # Core library source code
├── examples/
│   ├── NLG/          # GPT-2 Natural Language Generation Example
│   └── NLU/          # RoBERTa/DeBERTa Natural Language Understanding Example
├── README.md
└── setup.py

Installation and Usage

Installation Method

pip install loralib

# Or install from source
pip install git+https://github.com/microsoft/LoRA

Basic Usage Example

1. Replace Linear Layers

# ===== Before Modification =====
# layer = nn.Linear(in_features, out_features)

# ===== After Modification =====
import loralib as lora
layer = lora.Linear(in_features, out_features, r=16)

2. Mark Trainable Parameters

import loralib as lora

model = BigModel()
# Only set parameters containing "lora_" as trainable
lora.mark_only_lora_as_trainable(model)

# Training loop
for batch in dataloader:
    # Normal training process
    pass

3. Save and Load Checkpoints

# Save LoRA parameters
torch.save(lora.lora_state_dict(model), 'lora_checkpoint.pt')

# Load pre-trained model
model.load_state_dict(torch.load('pretrained.pt'), strict=False)
# Load LoRA parameters
model.load_state_dict(torch.load('lora_checkpoint.pt'), strict=False)

Supported Layer Types

Currently, the LoRA library supports the following layer types:

  • nn.Linearlora.Linear
  • nn.Embeddinglora.Embedding
  • nn.Conv2dlora.Conv2d
  • lora.MergedLinear (for merged linear layers such as QKV projections)

Advanced Features

1. Merged Linear Layer Support

# For scenarios such as QKV projections
qkv_proj = lora.MergedLinear(
    d_model, 3*d_model, 
    r=8, 
    enable_lora=[True, False, True]  # Apply LoRA only to Q and V
)

2. Bias Vector Training

# Train LoRA-related biases
lora.mark_only_lora_as_trainable(model, bias='lora_only')

# Train all biases
lora.mark_only_lora_as_trainable(model, bias='all')

3. Weight Merging at Inference Time

# Evaluation mode: merge weights, eliminate inference latency
model.eval()

# Training mode: restore separated state
model.train()

Application Scenarios

  1. Large Model Fine-tuning: Significantly reduce fine-tuning costs.
  2. Multi-task Learning: Efficient task switching.
  3. Resource-Constrained Environments: Reduce memory and storage requirements.
  4. Rapid Prototyping: Accelerate model adaptation process.

Ecosystem Integration

  • Hugging Face PEFT: Now integrated into HF's PEFT library.
  • PyTorch Ecosystem: Fully compatible with the PyTorch framework.
  • Pre-trained Models: Supports various mainstream pre-trained models.

Summary

The Microsoft LoRA project provides a breakthrough solution for the efficient fine-tuning of large language models. Through clever low-rank adaptation technology, it maintains excellent model performance while significantly reducing computational and storage costs. This project not only has important academic value but also provides a feasible technical path for practical industrial applications, and is a milestone work in the field of parameter-efficient fine-tuning.

Star History Chart