Home
Login

A no-code, low-code large language model fine-tuning and deployment framework that supports unified and efficient fine-tuning of 100+ LLMs/VLMs.

Apache-2.0Python 52.6khiyouga Last Updated: 2025-06-18

💡 LLaMA‑Factory Project Explained

I. Project Overview

LLaMA‑Factory is an open-source platform focused on fine-tuning, training, and deploying large language models (LLM/VLM). Released by Yaowei Zheng et al. at ACL 2024 and included in arXiv ([gitee.com][1]), the project highlights the following features:

  • Supports 100+ Models: Including mainstream and emerging models such as LLaMA, LLaVA, Mistral, Qwen, ChatGLM, and Phi.
  • Zero-Code + Low-Code Interface: CLI and Web UI (LLaMABoard) modes, covering common training processes with extremely low technical barriers.
  • Integrates Multiple Efficient Fine-Tuning Methods: Supports LoRA, QLoRA (2/4/8 bit), freezing, 16-bit full parameter, FlashAttention‑2, Unsloth, RoPE scaling, etc.
  • Rich Tuning Algorithms: GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, Mixture‑of‑Depths, LoRA+, LoftQ, PiSSA, etc.
  • Multiple Training Methods: Pre-training, Supervised Fine-Tuning (SFT), Reward Modeling (RM), Reinforcement Learning methods such as PPO/DPO/KTO/ORPO.
  • Multiple Experiment Monitoring Tools: Supports LlamaBoard, TensorBoard, Wandb, MLflow, SwanLab, etc.
  • Inference and Deployment Compatibility: Supports OpenAI API-style deployment, vLLM concurrent inference, Gradio UI, and other rich inference options.

II. Core Feature Highlights

1. Wide Range of Model Support

Covers over a hundred models, including various sizes and architectures, from LLaMA and Phi to Qwen2-VL, Gemma, and DeepSeek.

2. Efficient Fine-Tuning Techniques

  • LoRA / QLoRA: Supports low-bit quantization adaptive fine-tuning; 4-bit LoRA has faster inference speed and lower memory requirements than traditional methods.
  • Optimization Operators: FlashAttention-2, Unsloth improve training speed and memory utilization.
  • RoPE Scaling: Extends context length capabilities.

3. Training and Reinforcement Learning

Integrates common training processes: from pre-training and SFT to reward model training and PPO/DPO reinforcement learning.

4. Visualization Monitoring

Real-time viewing of training progress, metrics, and logs through Web UI (LLaMABoard), TensorBoard, Wandb, etc.

5. Inference and Deployment Capabilities

Supports exporting fine-tuned models in OpenAI API format and implementing concurrent inference (vLLM) or building a Gradio frontend.


III. Usage Flow & Quick Start

Installation / Startup

pip install llama-factory   # Or install from GitHub clone
  • CLI Mode:

    llama-factory train \
      --model llama-13b \
      --dataset mydata \
      --finetuning_type lora \
      ## Refer to the official documentation for more parameters
    
  • Web UI Mode:

    CUDA_VISIBLE_DEVICES=0 python src/train_web.py
    

    Start LLaMABoard for one-stop setting of training hyperparameters.


Data Preparation

The project comes with 60+ datasets (data directory) and also supports custom JSON files, uniformly managed in dataset_info.json.


Monitoring and Evaluation

Automatically supports TensorBoard and Wandb display during training; can also be connected to MLflow, SwanLab, and other monitoring backends.


Inference and Deployment

After training, directly generate a deployment package through CLI or export script, supporting concurrent inference and Gradio display.

V. Summary

LLaMA‑Factory is a feature-rich, easy-to-use, and technologically advanced LLM fine-tuning framework. Whether you are a researcher or an engineer, you can quickly customize, train, and deploy massive open-source models without writing complex code, making it a powerful tool for entering the field of LLM fine-tuning.