A no-code, low-code large language model fine-tuning and deployment framework that supports unified and efficient fine-tuning of 100+ LLMs/VLMs.
💡 LLaMA‑Factory Project Explained
I. Project Overview
LLaMA‑Factory is an open-source platform focused on fine-tuning, training, and deploying large language models (LLM/VLM). Released by Yaowei Zheng et al. at ACL 2024 and included in arXiv ([gitee.com][1]), the project highlights the following features:
- Supports 100+ Models: Including mainstream and emerging models such as LLaMA, LLaVA, Mistral, Qwen, ChatGLM, and Phi.
- Zero-Code + Low-Code Interface: CLI and Web UI (LLaMABoard) modes, covering common training processes with extremely low technical barriers.
- Integrates Multiple Efficient Fine-Tuning Methods: Supports LoRA, QLoRA (2/4/8 bit), freezing, 16-bit full parameter, FlashAttention‑2, Unsloth, RoPE scaling, etc.
- Rich Tuning Algorithms: GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, Mixture‑of‑Depths, LoRA+, LoftQ, PiSSA, etc.
- Multiple Training Methods: Pre-training, Supervised Fine-Tuning (SFT), Reward Modeling (RM), Reinforcement Learning methods such as PPO/DPO/KTO/ORPO.
- Multiple Experiment Monitoring Tools: Supports LlamaBoard, TensorBoard, Wandb, MLflow, SwanLab, etc.
- Inference and Deployment Compatibility: Supports OpenAI API-style deployment, vLLM concurrent inference, Gradio UI, and other rich inference options.
II. Core Feature Highlights
1. Wide Range of Model Support
Covers over a hundred models, including various sizes and architectures, from LLaMA and Phi to Qwen2-VL, Gemma, and DeepSeek.
2. Efficient Fine-Tuning Techniques
- LoRA / QLoRA: Supports low-bit quantization adaptive fine-tuning; 4-bit LoRA has faster inference speed and lower memory requirements than traditional methods.
- Optimization Operators: FlashAttention-2, Unsloth improve training speed and memory utilization.
- RoPE Scaling: Extends context length capabilities.
3. Training and Reinforcement Learning
Integrates common training processes: from pre-training and SFT to reward model training and PPO/DPO reinforcement learning.
4. Visualization Monitoring
Real-time viewing of training progress, metrics, and logs through Web UI (LLaMABoard), TensorBoard, Wandb, etc.
5. Inference and Deployment Capabilities
Supports exporting fine-tuned models in OpenAI API format and implementing concurrent inference (vLLM) or building a Gradio frontend.
III. Usage Flow & Quick Start
Installation / Startup
pip install llama-factory # Or install from GitHub clone
CLI Mode:
llama-factory train \ --model llama-13b \ --dataset mydata \ --finetuning_type lora \ ## Refer to the official documentation for more parameters
Web UI Mode:
CUDA_VISIBLE_DEVICES=0 python src/train_web.py
Start LLaMABoard for one-stop setting of training hyperparameters.
Data Preparation
The project comes with 60+ datasets (data directory) and also supports custom JSON files, uniformly managed in dataset_info.json.
Monitoring and Evaluation
Automatically supports TensorBoard and Wandb display during training; can also be connected to MLflow, SwanLab, and other monitoring backends.
Inference and Deployment
After training, directly generate a deployment package through CLI or export script, supporting concurrent inference and Gradio display.
V. Summary
LLaMA‑Factory is a feature-rich, easy-to-use, and technologically advanced LLM fine-tuning framework. Whether you are a researcher or an engineer, you can quickly customize, train, and deploy massive open-source models without writing complex code, making it a powerful tool for entering the field of LLM fine-tuning.