Home
Login

An open-source tool for post-training AI models, supporting various training methods such as fine-tuning, LoRA, and QLoRA.

Apache-2.0Python 9.7kaxolotl-ai-cloud Last Updated: 2025-06-19

Axolotl - Powerful AI Model Post-Training Tool

Project Overview

Axolotl is an open-source tool specifically designed to simplify various AI model post-training processes. Post-training refers to any modifications or additional training performed on a pre-trained model, including full model fine-tuning, parameter-efficient tuning (such as LoRA and QLoRA), supervised fine-tuning (SFT), instruction tuning, and alignment techniques. This tool supports a wide range of model architectures and training configurations, making it easy for users to get started with these advanced training techniques.

Core Features

Model Support

  • Diverse Model Architectures: Supports training various Hugging Face models, including mainstream large language models such as LLaMA, Pythia, Falcon, MPT, Mistral, Mixtral, etc.
  • Flexible Training Methods: Supports various training methods such as Full Fine-tuning, LoRA, QLoRA, ReLoRA, GPTQ, etc.

Configuration Management

  • YAML Configuration Files: Uses simple YAML files to contain all necessary configurations such as dataset preprocessing, model training/fine-tuning, model inference, or evaluation.
  • CLI Overrides: Supports overriding settings in the configuration file through command-line arguments.
  • Flexible Configuration: Allows customization of various training parameters and model settings.

Data Processing Capabilities

  • Multi-Format Datasets: Supports loading local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.
  • Custom Formats: Can use custom formats or directly import pre-tokenized datasets.
  • Dataset Preprocessing: Built-in powerful data preprocessing functions.

Performance Optimization

  • Advanced Optimization Techniques: Integrates xformers, Flash Attention, Liger Kernel, RoPE scaling, and multi-packing techniques.
  • Multi-GPU Support: Supports single or multi-GPU training through FSDP or DeepSpeed.
  • Efficient Training: Optimized for NVIDIA GPUs (Ampere or newer, supporting bf16 and Flash Attention) and AMD GPUs.

Deployment and Monitoring

  • Cloud-Ready: Provides Docker images and PyPI packages for use on cloud platforms and local hardware.
  • Result Logging: Supports logging results and checkpoints to WandB, MLflow, or Comet.
  • Monitoring Support: Integrates various experiment tracking and monitoring tools.

Technical Requirements

Hardware Requirements

  • NVIDIA GPU (Ampere or newer, for bf16 and Flash Attention) or AMD GPU.
  • Sufficient GPU memory for model training.

Software Requirements

  • Python 3.11
  • PyTorch ≥2.4.1
  • Relevant dependency packages.

Installation Instructions

Quick Installation

pip3 install -U packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation axolotl[flash-attn,deepspeed]

# Download example configuration files
axolotl fetch examples
axolotl fetch deepspeed_configs  # Optional

Installation from Source

git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl
pip3 install -U packaging setuptools wheel ninja
pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]'

Docker Installation

docker run --gpus '"all"' --rm -it axolotlai/axolotl:main-latest

Usage Instructions

Basic Usage Flow

  1. Fetch Example Configuration:

    axolotl fetch examples
    
  2. Train Model:

    axolotl train examples/llama-3/lora-1b.yml
    
  3. Customize Configuration: Modify parameters in the YAML configuration file as needed.

Configuration File Structure

Axolotl uses YAML configuration files to control the entire training process, including:

  • Model selection and parameters
  • Dataset configuration and preprocessing
  • Training hyperparameters
  • Optimizer settings
  • Monitoring and logging

Supported Model Compatibility Matrix

Model fp16/fp32 LoRA QLoRA GPTQ Flash Attn xformers
LLaMA
Mistral
Mixtral-MoE
Pythia
Falcon
Qwen
Gemma

✅: Supported ❌: Not Supported ❓: Untested

Application Scenarios

Research Field

  • Large language model fine-tuning research
  • Parameter-efficient training method experiments
  • Model alignment and safety research

Industrial Application

  • Enterprise-level model customization
  • Domain-specific model training
  • Product feature model optimization

Education and Training

  • AI/ML course teaching
  • Practical project development
  • Technical skill enhancement

Project Advantages

  1. Ease of Use: Controls complex training processes through simple YAML configuration files.
  2. Flexibility: Supports various model architectures and training methods.
  3. Performance: Integrates the latest optimization techniques, providing an efficient training experience.
  4. Scalability: Supports various training scales from single GPU to multi-node.
  5. Open Source: Apache 2.0 license, completely open source and free to use.