A lightweight GPT training framework that allows you to train a small 26M parameter language model from scratch in just 2 hours.
MiniMind - Ultra-Lightweight GPT Training Framework
Project Overview
MiniMind is an extremely lightweight large language model training framework, with the smallest version being only 1/7000 the size of GPT-3, allowing for rapid training on ordinary personal GPUs. This project enables the complete training of a small 26M parameter GPT model from scratch within 2 hours.
Core Features
🚀 Ultra-Lightweight Design
- Minimal Parameter Count: The smallest model requires only 26M (0.02B) parameters to achieve fluent conversational abilities.
- Efficient Training: Supports training completion on a single 3090 GPU within 2 hours.
- Resource-Friendly: Can run on ordinary personal GPUs, greatly reducing the training threshold.
🧠 Complete Training Ecosystem
The project open-sources a minimalist architecture for large models, including the following core functionalities:
- Pretraining - Training a foundational language model from scratch.
- Supervised Fine-Tuning (SFT) - Supervised instruction fine-tuning.
- LoRA Fine-Tuning - Low-Rank Adaptation fine-tuning technique.
- DPO Algorithm - Direct Preference Optimization algorithm.
- Model Distillation - Knowledge distillation algorithm.
- MoE Expansion - Mixture of Experts model support.
🎯 Technical Architecture
Framework Support
- Native PyTorch: Built on the native PyTorch framework, supporting multi-GPU acceleration.
- Strong Compatibility: Compatible with mainstream frameworks such as transformers, accelerate, trl, and peft.
- Flexible Deployment: Supports single-GPU and multi-GPU training setups (DDP, DeepSpeed).
Training Features
- Checkpointing: Training process supports stopping and resuming at any time.
- Multi-GPU Training: Supports DDP distributed training, scalable to multi-machine, multi-GPU clusters.
- Monitoring Integration: Supports wandb training process recording and visualization.
🌟 Multimodal Expansion
MiniMind-V Visual Multimodal Version
- Visual Understanding: Expanded to visual multimodal VLM: MiniMind-V.
- Unified Architecture: Based on the MiniMind language model as a foundation, adding visual encoding capabilities.
📊 Model Capabilities
MiniMind can perform various tasks, including text generation, dialogue interaction, and knowledge retrieval. It can generate text based on given prompts or context, engage in dialogue interaction, and retrieve knowledge on various topics.
Main Functions
- Text Generation: Generates coherent text content based on prompts.
- Dialogue Interaction: Supports multi-turn conversations and question answering.
- Knowledge Retrieval: Possesses a certain level of knowledge question answering ability.
- Code Understanding: Supports basic code generation and understanding.
🎓 Educational Value
The goal of this project is to lower the learning threshold for LLMs, allowing everyone to train a very small language model starting from understanding each line of code. The project adopts the concept of "building an airplane with building blocks," allowing users to deeply understand the underlying implementation of LLMs, rather than being isolated by high-level encapsulation.
💻 Usage
Environment Requirements
- PyTorch 2.1.2+
- CUDA 12.2+
- Flash Attention 2
- RTX 3090 or higher performance GPU (recommended)
Quick Start
# Clone the project
git clone https://github.com/jingyaogong/minimind.git
cd minimind
# Install dependencies
pip install -r requirements.txt
# Single GPU training
python train.py
# Multi-GPU training (N>1)
torchrun --nproc_per_node N train.py
Training Configuration
# Enable wandb monitoring
wandb login
python train.py --use_wandb
# Specify project name
python train.py --wandb_project "my_minimind" --wandb_run_name "experiment_1"
🔄 Training Process
- Data Preparation: Dataset cleaning and preprocessing.
- Pretraining: Unsupervised training on large-scale text data.
- Instruction Fine-Tuning: Supervised fine-tuning using instruction data.
- Preference Optimization: Optimizing model output preferences through the DPO algorithm.
- Model Evaluation: Testing performance on benchmark datasets such as Ceval.
📈 Performance
- Training Speed: Tested on RTX 3090 GPU using Torch 2.1.2, CUDA 12.2, and Flash Attention 2.
- Dialogue Quality: Fluent dialogue can be achieved with only 26M parameters.
- Resource Consumption: Low memory footprint, suitable for individual developers.
🌍 Community Ecosystem
- Open Source and Free: Completely open source, with all core algorithm code publicly available.
- Comprehensive Documentation: Provides detailed documentation in both Chinese and English.
- Continuous Updates: Active development community, continuous feature iteration.
- Education-Friendly: Suitable for learning and teaching.
🔗 Related Projects
- Main Project: minimind
- Multimodal Version: minimind-v
- MoE Version: minimind-v1-moe
Summary
MiniMind is a groundbreaking lightweight LLM training framework that demonstrates the possibility of training language models with practical conversational abilities under limited computing resources. The project not only provides a complete training toolchain but, more importantly, offers an excellent platform for AI learners and researchers to understand the internal mechanisms of LLMs. Through the concept of "starting from scratch and understanding every line of code," MiniMind is democratizing artificial intelligence technology, allowing more people to participate in the development and research of large models.