XTuner - An Efficient Toolkit for Fine-tuning Large Language Models
Project Overview
XTuner is an efficient, flexible, and full-featured large language model fine-tuning toolkit developed by the InternLM team. This project aims to provide users with an easy-to-use yet powerful tool for fine-tuning various large language models, including mainstream models such as InternLM, Llama, Qwen, ChatGLM, and Baichuan.
Core Features
1. Efficiency
- Low Resource Requirements: Supports fine-tuning 7B parameter large language models on a single 8GB GPU.
- Multi-node Scaling: Supports multi-node fine-tuning of models with over 70B parameters.
- Performance Optimization: Automatically schedules high-performance operators, such as FlashAttention and Triton kernels, to improve training throughput.
- DeepSpeed Integration: Compatible with the DeepSpeed framework, making it easy to use various ZeRO optimization techniques.
2. Flexibility
- Multi-Model Support: Supports a variety of large language models.
- InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3)
- Meta Llama series (Llama 2, Llama 3)
- Other mainstream models: Mixtral-8x7B, ChatGLM, Qwen, Baichuan, Gemma, DeepSeek, etc.
- Multi-Modal Support: Supports visual language models (VLM), especially models based on the LLaVA architecture.
- Data Pipeline: Well-designed data pipeline that supports datasets in various formats.
- Multiple Training Algorithms: Supports various training strategies such as QLoRA, LoRA, and full-parameter fine-tuning.
3. Full-featured
- Multiple Training Modes:
- Continual Pre-training
- Instruction Fine-tuning
- Agent Fine-tuning
- Dialogue Functionality: Supports dialogue with large models using predefined templates.
- Seamless Integration: Output models can be seamlessly integrated with deployment and serving toolkits (LMDeploy) and large-scale evaluation toolkits (OpenCompass, VLMEvalKit).
Supported Models
XTuner supports a wide range of model families, including but not limited to:
Model Series |
Specific Models |
Features |
InternLM |
InternLM, InternLM2, InternLM2.5, InternLM3 |
Chinese-optimized, excellent performance |
Llama |
Llama 2, Llama 3 |
Meta open-source model |
Qwen |
Qwen 1.5, etc. |
Alibaba open-source model |
ChatGLM |
ChatGLM3-6B, etc. |
Tsinghua University open-source model |
Baichuan |
Baichuan2, etc. |
Baichuan Intelligence open-source model |
Mixtral |
Mixtral 8x7B |
Mistral AI's Mixture of Experts model |
Other |
Gemma, DeepSeek, MiniCPM, etc. |
Open-source models from various companies |
Multi-Modal Capabilities
XTuner excels in the multi-modal domain, especially in visual language models:
- LLaVA Architecture Support: Fully supports pre-training and fine-tuning of the LLaVA-v1.5 architecture.
- Excellent Performance: The LLaVA-InternLM2-20B model has outstanding performance.
- Multiple Combinations: Supports various combinations of visual encoders and language models.
- Latest Releases:
- LLaVA-Llama-3-8B
- LLaVA-Llama-3-8B-v1.1
- LLaVA-Phi-3-mini
Installation and Usage
Environment Preparation
# Create a Python 3.10 virtual environment
conda create --name xtuner-env python=3.10 -y
conda activate xtuner-env
Installation Methods
Method 1: Install via pip
pip install -U xtuner
Method 2: Integrate DeepSpeed
pip install -U 'xtuner[deepspeed]'
Method 3: Install from source
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]'
Quick Start
1. Prepare Configuration File
# View all available configurations
xtuner list-cfg
# Copy the configuration file for customization
xtuner copy-cfg ${CONFIG_NAME} ${SAVE_PATH}
2. Start Fine-tuning
# Single GPU fine-tuning
xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
# Multi-GPU fine-tuning
NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_5_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
3. Model Conversion
# Convert PTH model to Hugging Face format
xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}
4. Dialogue Testing
# Dialogue with the fine-tuned model
xtuner chat ${NAME_OR_PATH_TO_LLM} --adapter ${NAME_OR_PATH_TO_ADAPTER}
Advanced Features
1. Sequence Parallelism
- Supports extremely long sequence training
- Efficient and scalable training method
- Suitable for scenarios that require processing long texts
2. DPO/ORPO Training
- Supports Direct Preference Optimization (DPO)
- Supports Odds Ratio Preference Optimization (ORPO)
- Supports Reward Model training
- Supports packed data and sequence parallelism
3. Mathematical Reasoning Optimization
- Supports OREAL (a new reinforcement learning method)
- Specifically optimized for mathematical reasoning tasks
Performance
Training Speed
- Llama2 7B: Excellent training speed on a single GPU.
- Llama2 70B: Supports multi-GPU parallel training with excellent speed performance.
- DeepSeek V2: 2x training speed improvement compared to previous versions.
Memory Efficiency
- Low Memory Requirements: 20GB GPU memory is sufficient for QLoRA fine-tuning.
- Full-Parameter Fine-tuning: 4x80GB GPUs can perform full-parameter fine-tuning.
- Memory Optimization: Significantly reduces memory usage through various optimization techniques.
Ecosystem Integration
XTuner, as an important component of the InternLM ecosystem, is tightly integrated with other tools:
- LMDeploy: Model deployment and serving toolkit.
- OpenCompass: Large-scale evaluation toolkit.
- VLMEvalKit: Visual language model evaluation toolkit.
- Lagent: Agent framework.
- AgentLego: Versatile tool API library.
Application Scenarios
1. Academic Research
- Large language model fine-tuning research
- Multi-modal model development
- New algorithm verification
2. Industrial Applications
- Customized chatbots
- Domain-specific model development
- Enterprise-level AI assistants
3. Education and Training
- AI course teaching
- Experimental environment setup
- Skills training
Conclusion
XTuner is a comprehensive and high-performance large language model fine-tuning toolkit. It not only supports a wide range of models and training algorithms but also provides a complete toolchain, from data preparation to model deployment, providing users with a one-stop solution. Whether for academic research or industrial applications, XTuner can meet the needs of different scenarios and is an ideal choice for fine-tuning large language models.