unslothai/unslothView GitHub Homepage for Latest Official Releases

Unsloth: Efficient fine-tuning and reinforcement learning tool for large language models - 2x faster training, 70% less memory usage

Apache-2.0Pythonunslothunslothai 43.4k Last Updated: August 07, 2025

Unsloth Project Detailed Introduction

Project Overview

Unsloth is an open-source tool focused on fine-tuning and reinforcement learning for large language models (LLMs). It enables up to 2x faster training speeds and reduces VRAM usage by 70% for models like Qwen3, Llama 4, DeepSeek-R1, Gemma 3, and TTS. The project aims to make AI technology more accessible and easier to use, providing researchers and developers with efficient model training solutions.

Key Features

High-Performance Optimization: 2x faster training speed, 70% reduction in VRAM usage
Zero Precision Loss: No approximation methods used, ensuring training accuracy
Broad Compatibility: Supports various mainstream LLM models and training methods
User-Friendly: Provides beginner-friendly notebooks and detailed documentation

Core Functions and Features

1. Model Support

Unsloth supports a wide range of mainstream large language models, including:

Llama Series: Llama 4, Llama 3.3 (70B), Llama 3.2, Llama 3.1
Qwen Series: Qwen 3 (14B), Qwen 2.5 (including Coder models)
Gemma Series: Gemma 3, Gemma 2 (9B/27B)
Other Models: Phi-4 (14B), Mistral Small (22B), DeepSeek-R1, etc.

2. Training Methods

Fine-tuning:

Supports full-parameter fine-tuning and pre-training
4-bit, 8-bit, 16-bit quantized training
QLoRA and LoRA fine-tuning
Dynamic 4-bit quantization technology

Reinforcement Learning:

DPO (Direct Preference Optimization)
GRPO (Long Context Reasoning)
PPO (Proximal Policy Optimization)
Reward model training
Online DPO

3. Technical Advantages

Performance Optimization:

All kernels are written in OpenAI's Triton language with a manual backpropagation engine
0% precision loss - no approximation methods - all exact calculations
Supports long context training (up to 342K context)

Memory Optimization:

Dynamic 4-bit quantization technology, improving accuracy while only increasing VRAM usage by <10%
Gradient checkpointing optimization, further reducing memory usage by 30%
Supports 4x longer context windows

4. Hardware Compatibility

GPU Requirements: Supports NVIDIA GPUs since 2018, minimum CUDA capability 7.0
Supported Models: V100, T4, Titan V, RTX 20/30/40 series, A100, H100, L40, etc.
Operating Systems: Linux and Windows
Special Support: GTX 1070, 1080 can run but are slower

5. Integration and Ecosystem

Framework Integration:

Official support from 🤗 Hugging Face TRL library
Supports Trainer, Seq2SeqTrainer
Compatible with native PyTorch code

Deployment Options:

Export to GGUF format
Supports Ollama, vLLM deployment
Hugging Face Model Hub integration

Installation and Usage

Quick Installation

For Linux devices, it is recommended to install using pip:

pip install unsloth

Basic Usage Example

from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer, SFTConfig

# Load the model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-3-4B-it",
    max_seq_length = 2048,
    load_in_4bit = True,
)

# Add LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    use_gradient_checkpointing = "unsloth",
)

# Start training
trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    # Other training parameters
)
trainer.train()

Performance Benchmarks

Training Efficiency Comparison

Model	VRAM	Unsloth Speed	VRAM Reduction	Context Length	Hugging Face + FA2
Llama 3.3 (70B)	80GB	2x	>75%	13x Longer	1x
Llama 3.1 (8B)	80GB	2x	>70%	12x Longer	1x

Long Context Support

Under the same hardware conditions, Unsloth supports significantly longer context lengths than traditional methods:

8GB GPU: Unsloth supports 2,972 tokens, traditional methods OOM
24GB GPU: Unsloth supports 78,475 tokens, traditional methods only 5,789 tokens
80GB GPU: Unsloth supports 342,733 tokens, traditional methods only 28,454 tokens

Latest Feature Updates

Recent Important Updates

Llama 4 Support: Meta's latest Scout and Maverick models
Full Support: FFT, all models (Mixtral, MOE, Cohere, Mamba) and all training algorithms
Vision Models: Supports Llama 3.2 Vision, Qwen 2.5 VL, Pixtral, etc.
Inference Optimization: 2x faster inference speed

Special Features

Chat Interface: Provides an interactive chat interface
Gradient Accumulation Fix: Discovered and fixed a gradient accumulation bug
Cut Cross Entropy: Optimization technology added in collaboration with Apple
Multilingual Continuous Pre-training: Supports Korean and other languages

Community and Ecosystem

Documentation and Support

Official Documentation: docs.unsloth.ai
GitHub Repository: Active open-source community
Social Media: Twitter/X official account
Community Forum: Reddit page for discussion

Learning Resources

Beginner-friendly Colab notebooks
Detailed installation and usage guides
Kaggle competition-specific notebooks
Complete API documentation

Summary

Unsloth is one of the best open-source LLM fine-tuning tools currently available. It achieves significant improvements in training speed and memory efficiency through innovative optimization techniques. Both researchers and industry developers can benefit from Unsloth's efficient training capabilities. The project's continuous updates and active community support make it an important choice in the LLM fine-tuning field.