axolotl-ai-cloud/axolotlPlease refer to the latest official releases for information GitHub Homepage

An open-source tool for post-training AI models, supporting various training methods such as fine-tuning, LoRA, and QLoRA.

Apache-2.0Python 9.7kaxolotl-ai-cloud Last Updated: 2025-06-19

Axolotl - Powerful AI Model Post-Training Tool

Project Overview

Axolotl is an open-source tool specifically designed to simplify various AI model post-training processes. Post-training refers to any modifications or additional training performed on a pre-trained model, including full model fine-tuning, parameter-efficient tuning (such as LoRA and QLoRA), supervised fine-tuning (SFT), instruction tuning, and alignment techniques. This tool supports a wide range of model architectures and training configurations, making it easy for users to get started with these advanced training techniques.

Core Features

Model Support

Diverse Model Architectures: Supports training various Hugging Face models, including mainstream large language models such as LLaMA, Pythia, Falcon, MPT, Mistral, Mixtral, etc.
Flexible Training Methods: Supports various training methods such as Full Fine-tuning, LoRA, QLoRA, ReLoRA, GPTQ, etc.

Configuration Management

YAML Configuration Files: Uses simple YAML files to contain all necessary configurations such as dataset preprocessing, model training/fine-tuning, model inference, or evaluation.
CLI Overrides: Supports overriding settings in the configuration file through command-line arguments.
Flexible Configuration: Allows customization of various training parameters and model settings.

Data Processing Capabilities

Multi-Format Datasets: Supports loading local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.
Custom Formats: Can use custom formats or directly import pre-tokenized datasets.
Dataset Preprocessing: Built-in powerful data preprocessing functions.

Performance Optimization

Advanced Optimization Techniques: Integrates xformers, Flash Attention, Liger Kernel, RoPE scaling, and multi-packing techniques.
Multi-GPU Support: Supports single or multi-GPU training through FSDP or DeepSpeed.
Efficient Training: Optimized for NVIDIA GPUs (Ampere or newer, supporting bf16 and Flash Attention) and AMD GPUs.

Deployment and Monitoring

Cloud-Ready: Provides Docker images and PyPI packages for use on cloud platforms and local hardware.
Result Logging: Supports logging results and checkpoints to WandB, MLflow, or Comet.
Monitoring Support: Integrates various experiment tracking and monitoring tools.

Technical Requirements

Hardware Requirements

NVIDIA GPU (Ampere or newer, for bf16 and Flash Attention) or AMD GPU.
Sufficient GPU memory for model training.

Software Requirements

Python 3.11
PyTorch ≥2.4.1
Relevant dependency packages.

Installation Instructions

Quick Installation

pip3 install -U packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation axolotl[flash-attn,deepspeed]

# Download example configuration files
axolotl fetch examples
axolotl fetch deepspeed_configs  # Optional

Installation from Source

git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl
pip3 install -U packaging setuptools wheel ninja
pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]'

Docker Installation

docker run --gpus '"all"' --rm -it axolotlai/axolotl:main-latest

Usage Instructions

Basic Usage Flow

Fetch Example Configuration:
```
axolotl fetch examples
```

Train Model:

axolotl train examples/llama-3/lora-1b.yml

Customize Configuration: Modify parameters in the YAML configuration file as needed.

Configuration File Structure

Axolotl uses YAML configuration files to control the entire training process, including:

Model selection and parameters
Dataset configuration and preprocessing
Training hyperparameters
Optimizer settings
Monitoring and logging

Supported Model Compatibility Matrix

Model	fp16/fp32	LoRA	QLoRA	GPTQ	Flash Attn	xformers
LLaMA	✅	✅	✅	✅	✅	✅
Mistral	✅	✅	✅	✅	✅	✅
Mixtral-MoE	✅	✅	✅	❓	❓	❓
Pythia	✅	✅	✅	❌	❌	❌
Falcon	✅	✅	✅	❌	❌	❌
Qwen	✅	✅	✅	❓	❓	❓
Gemma	✅	✅	✅	❓	❓	✅

✅: Supported ❌: Not Supported ❓: Untested

Application Scenarios

Research Field

Large language model fine-tuning research
Parameter-efficient training method experiments
Model alignment and safety research

Industrial Application

Enterprise-level model customization
Domain-specific model training
Product feature model optimization

Education and Training

AI/ML course teaching
Practical project development
Technical skill enhancement

Project Advantages

Ease of Use: Controls complex training processes through simple YAML configuration files.
Flexibility: Supports various model architectures and training methods.
Performance: Integrates the latest optimization techniques, providing an efficient training experience.
Scalability: Supports various training scales from single GPU to multi-node.
Open Source: Apache 2.0 license, completely open source and free to use.