LMFlow - Large Foundation Model Fine-tuning and Inference Toolkit
Project Overview
LMFlow, developed by the OptimalScale team, is an open-source project and a scalable, convenient, and efficient toolkit for fine-tuning large machine learning models. Designed for user-friendliness, speed, and reliability, the project aims to make large language model technology accessible to the entire community, realizing the vision of "making large models serve everyone."
Project Address: https://github.com/OptimalScale/LMFlow
Core Features
1. Diverse Training Methods Supported
- Full Parameter Fine-tuning: Updates all parameters to fine-tune the language model.
- LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning algorithm that is more efficient than full parameter fine-tuning.
- LISA (Layerwise Importance Sampling): A memory-efficient fine-tuning algorithm that can train 7B models in 24GB of VRAM without offloading.
2. Broad Model Support
Supports a variety of mainstream large language models, including:
- DeepSeek Series: deepseek, deepseek_v2, deepseek_r1, etc.
- LLaMA Series: llama2, llama3, llama3_for_tool
- Qwen Series: qwen2, qwen2_for_tool, qwen2_5, etc.
- Gemma, Phi, Yi, InternLM2 and other model architectures.
3. Performance Optimization Techniques
Memory Optimization
- FlashAttention-2: Supports the latest FlashAttention technology, significantly improving training and inference speed.
- Gradient Checkpointing: Optimizes memory usage through a strategy of trading computation for memory.
- DeepSpeed Zero-3: Supports distributed training of large-scale models.
Inference Acceleration
- vLLM Integration: Supports fast and easy-to-use LLM inference and serving.
- Speculative Decoding: Supports speculative decoding technology to accelerate inference.
- CPU Inference: Supports running LLaMA models on CPUs (through 4-bit quantization).
4. Rich Feature Set
Dialogue Template Support
- Pre-sets the latest Llama-3 and Phi-3 dialogue templates.
- Supports various common templates such as chatml.
- Customizable dialogue templates for better performance.
Multi-modal Support
- Supports multi-modal input of images and text.
- Provides multi-modal chatbot functionality.
- Online demo service available.
Long Context Handling
- Supports position interpolation (linear and NTK scaling) for LLaMA models.
- Extends the model's context handling capabilities.
5. Evaluation and Benchmarking
LMFlow Benchmark is an automated evaluation framework specifically designed for open-source large language models, using negative log-likelihood (NLL) as a metric to evaluate the model's capabilities in the following areas:
- Chitchat conversation
- Commonsense reasoning
- Instruction following
Technical Innovation
RAFT Algorithm
The project proposes a new alignment algorithm: Reward rAnked FineTuning (RAFT), which is more efficient than traditional PPO-based RLHF.
Custom Optimizers
Supports training with a variety of custom optimizers, including:
- RMSprop, LION-32bit, Adam, AdamW
- AdaFactor, Adan, RAdam and over 20 other optimizers
- The most suitable optimization strategy can be selected based on the specific task.
Real-world Application Cases
Breakthroughs in the Medical Field
Models trained with LMFlow perform exceptionally well in the medical field, with task-tuned models outperforming ChatGPT in medical applications, demonstrating significant potential in vertical domains.
Robin Model Series
The project has released several high-performance Robin models:
- Robin-33B-V2: Achieved an excellent score of 64.1 on the Huggingface LLM leaderboard.
- Provides checkpoints in multiple sizes: 7B, 13B, 33B, 65B, etc.
Installation and Usage
Environment Requirements
- Primarily tested on Linux OS (Ubuntu 20.04).
- Supports CUDA versions 10.3-11.7.
- Python 3.9 environment.
Quick Installation
git clone -b v0.0.9 https://github.com/OptimalScale/LMFlow.git
cd LMFlow
conda create -n lmflow python=3.9 -y
conda activate lmflow
conda install mpi4py
pip install -e .
PyPI Installation
pip install lmflow-finetune
Technical Impact
The LMFlow project has had a significant impact in both academia and industry:
- Related papers published in top academic conferences.
- Gained significant attention and usage on GitHub.
- Made significant contributions to the open-source large language model ecosystem.
Summary
LMFlow, as a comprehensive large language model toolkit, not only provides complete model training and inference solutions but also innovates in memory optimization, performance acceleration, model evaluation, and other aspects. It lowers the barrier to entry for using large language models, allowing more researchers and developers to easily build and deploy their own language models, truly realizing the goal of "making large models serve everyone."