🤗 PEFT - Parameter-Efficient Fine-Tuning Library Detailed Introduction
Project Overview
PEFT (Parameter-Efficient Fine-Tuning) is an advanced parameter-efficient fine-tuning library developed by Hugging Face. This project aims to address the high computational costs and massive storage requirements associated with fine-tuning large pre-trained models.
GitHub Address: https://github.com/huggingface/peft
Core Values and Advantages
1. Cost-Effectiveness
- Significantly Reduced Computational Costs: Compared to traditional full-parameter fine-tuning, PEFT methods only require training a small fraction of the model parameters.
- Significantly Reduced Storage Requirements: Fine-tuned model checkpoint files are typically only a few MB instead of several GB.
- Optimized Memory Usage: Able to handle larger models under the same hardware conditions.
2. Performance Retention
- Comparable to Full-Parameter Fine-Tuning: Achieves performance comparable to full fine-tuning on most tasks.
- Avoids Catastrophic Forgetting: Protects the original knowledge of the base model and reduces the risk of overfitting.
3. Flexibility and Convenience
- Multi-Task Adaptation: Can train multiple lightweight adapters for different tasks.
- Seamless Integration: Perfectly integrated with ecosystems such as Transformers, Diffusers, and Accelerate.
Supported Fine-Tuning Methods
Main PEFT Techniques
LoRA (Low-Rank Adaptation)
- The most popular parameter-efficient fine-tuning method.
- Significantly reduces trainable parameters through low-rank matrix factorization.
- Typically only requires training 0.1%-1% of the original parameters.
AdaLoRA
- An improved version of LoRA.
- Adaptively adjusts the rank size for further efficiency optimization.
Prefix Tuning
- Adds learnable prefixes to the input sequence.
- Suitable for generation tasks.
P-Tuning v2
- An improved prompt tuning method.
- Adds learnable parameters to multiple layers.
IA³ (Infused Adapter by Inhibiting and Amplifying Inner Activations)
- Adapts the model by inhibiting and amplifying internal activations.
Practical Application Effects
Memory Usage Comparison (A100 80GB GPU)
Model |
Full-Parameter Fine-Tuning |
PEFT-LoRA |
PEFT-LoRA + DeepSpeed CPU Offload |
T0_3B (3 billion params) |
47.14GB GPU / 2.96GB CPU |
14.4GB GPU / 2.96GB CPU |
9.8GB GPU / 17.8GB CPU |
mt0-xxl (12 billion params) |
Out of Memory |
56GB GPU / 3GB CPU |
22GB GPU / 52GB CPU |
bloomz-7b1 (7 billion params) |
Out of Memory |
32GB GPU / 3.8GB CPU |
18.1GB GPU / 35GB CPU |
Performance
Accuracy comparison on the Twitter Complaint Classification task:
- Human Baseline: 89.7%
- Flan-T5: 89.2%
- LoRA-T0-3B: 86.3%
Installation and Quick Start
Installation
pip install peft
Basic Usage Example
from transformers import AutoModelForSeq2SeqLM
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
# Configure PEFT
model_name_or_path = "bigscience/mt0-large"
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM,
inference_mode=False,
r=8,
lora_alpha=32,
lora_dropout=0.1
)
# Load and wrap the model
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
# View trainable parameters
model.print_trainable_parameters()
# Output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19
Inference Usage
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
# Load the fine-tuned model
model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
# Perform inference
model.eval()
inputs = tokenizer("Your input text", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
Ecosystem Integration
1. Transformers Integration
- Supports various pre-trained model architectures.
- Seamless training and inference workflows.
- Automatic model configuration and optimization.
2. Diffusers Integration
- Supports efficient fine-tuning of diffusion models.
- Suitable for image generation, editing, and other tasks.
- Significantly reduces training memory requirements.
3. Accelerate Integration
- Supports distributed training.
- Multi-GPU, TPU training optimization.
- Consumer-grade hardware friendly.
4. TRL (Transformer Reinforcement Learning) Integration
- Supports RLHF (Reinforcement Learning from Human Feedback).
- DPO (Direct Preference Optimization).
- Large model alignment training.
Application Scenarios
1. Large Language Model Fine-Tuning
- Instruction fine-tuning.
- Dialogue system optimization.
- Specific domain adaptation.
2. Multimodal Models
- Visual-language model fine-tuning.
- Audio processing model adaptation.
3. Diffusion Models
- Stable Diffusion personalization.
- DreamBooth training.
- Style transfer.
4. Reinforcement Learning
- Policy model fine-tuning.
- Reward model training.
- Human preference alignment.
Technical Advantages and Innovations
1. Parameter Efficiency
- Only train 0.1%-1% of the original parameters.
- Maintain over 95% of the performance.
- Checkpoint files reduced to 1/100 of the original size.
2. Memory Optimization
- Significantly reduces GPU memory requirements.
- Supports training large models on consumer-grade hardware.
- Gradient checkpointing further optimizes memory.
3. Quantization Compatibility
- Perfectly combined with 8-bit and 4-bit quantization.
- QLoRA technology support.
- Further reduces the hardware threshold.
4. Modular Design
- Supports multiple PEFT methods.
- Flexible configuration options.
- Easy to extend new methods.
Community and Ecosystem
Official Resources
Summary
🤗 PEFT is a revolutionary parameter-efficient fine-tuning library that not only solves the cost problem of large model fine-tuning but also maintains excellent performance. Whether for researchers or industrial developers, PEFT provides a cost-effective large model customization solution, promoting the democratization of AI technology.