Stage 4: Deep Learning and Neural Networks
A complete tutorial on building large language models from scratch, implementing the GPT architecture step-by-step with PyTorch, including the entire process of pre-training, fine-tuning, and deployment.
LLMs-from-scratch: Detailed Course Description
Project Overview
LLMs-from-scratch is a comprehensive learning resource created by Sebastian Raschka, designed to teach how to build large language models (LLMs) from scratch. This project serves as the official code repository for the book "Build a Large Language Model (From Scratch)."
Core Features
📚 Learning Objectives
- Understand the inner workings of large language models
- Gradually build your own LLM through coding
- Learn the training and development methods behind foundational models like ChatGPT
- Master techniques for loading and finetuning pre-trained model weights
🎯 Teaching Methodology
- Code from Scratch: Implement using PyTorch from the ground up, without relying on external LLM libraries
- Step-by-Step: Each stage is clearly explained with text, diagrams, and examples
- Highly Practical: Create small yet fully functional educational models
- Rich Supplementary Resources: Includes 17 hours and 15 minutes of video lessons
Course Structure
Chapter Content
Chapter 1: Understanding large language models
- Introduction to LLM fundamental concepts
- Model architecture overview
Chapter 2: Working with text data
- Main code:
ch02.ipynb,dataloader.ipynb - Text preprocessing and data loading
- Exercise solutions:
exercise-solutions.ipynb
Chapter 3: Coding attention mechanisms
- Main code:
ch03.ipynb,multihead-attention.ipynb - Implementation of self-attention mechanisms
- Detailed explanation of multi-head attention
Chapter 4: Implementing a GPT model from scratch
- Main code:
ch04.ipynb,gpt.py - Complete GPT architecture implementation
- Detailed explanation of model components
Chapter 5: Pretraining on unlabeled data
- Main code:
ch05.ipynb,gpt_train.py,gpt_generate.py - Pretraining workflow
- Text generation implementation
Chapter 6: Finetuning for classification
- Main code:
ch06.ipynb,gpt_class_finetune.py - Adapting models for specific classification tasks
- Finetuning techniques and strategies
Chapter 7: Finetuning to follow instructions
- Main code:
ch07.ipynb,gpt_instruction_finetuning.py - Instruction finetuning methods
- Model evaluation:
ollama_evaluate.py
Appendix Content
Appendix A: Introduction to PyTorch
- Code:
code-part1.ipynb,code-part2.ipynb - Distributed Data Parallel (DDP) training:
DDP-script.py - Quick start to PyTorch fundamentals
Appendices B-E
- Appendix B: References and further reading
- Appendix C: Summary of exercise solutions
- Appendix D: Adding additional features
- Appendix E: Parameter-efficient finetuning
Bonus Materials
Chapter 5 Additional Resources
- Alternative Weight Loading Methods: Different techniques for loading model weights
- Pretraining on Project Gutenberg Dataset: Training on a large text corpus
- Training Loop Optimizations: Adding various improvements
- Learning Rate Schedulers: Optimizing the training process
- Hyperparameter Tuning: Pretraining hyperparameter optimization
- Building a User Interface: UI for interacting with a pre-trained LLM
- Model Conversion:
- GPT to Llama conversion
- Llama 3.2 from scratch implementation
- Qwen3 Dense and Mixture-of-Experts (MoE) models
- Gemma 3 from scratch implementation
- Memory-Efficient Weight Loading: Optimizing model loading
- Tiktoken BPE Tokenizer Extension: Adding new tokens
- PyTorch Performance Optimization Tips: Accelerating LLM training
Chapter 6 Additional Resources
- Advanced techniques for classification finetuning
Chapter 7 Additional Resources
- Dataset Tools: Finding approximate duplicates and creating passive voice entries
- Response Evaluation: Evaluating instruction responses using OpenAI API and Ollama
- Dataset Generation: Generating datasets for instruction finetuning
- Dataset Improvement: Enhancing the quality of instruction finetuning datasets
- Preference Dataset Generation: Using Llama 3.1 70B and Ollama
- DPO Alignment: Direct Preference Optimization implementation
- User Interface: Interacting with instruction-finetuned GPT models
Inference Model Resources (from reasoning-from-scratch repository)
- Qwen3 base
- Model evaluation methods
Technical Requirements
Prerequisites
- ✅ Required: Strong foundation in Python programming
- ✅ Helpful: Basic knowledge of deep neural networks
- ✅ Helpful: PyTorch basics (Appendix A provides a quick start)
Hardware Requirements
- 💻 Standard Laptop Sufficient: Main chapter code designed to run on a regular laptop
- 🚀 Automatic GPU Acceleration: Code automatically uses GPU if available
- ⚡ No Specialized Hardware Needed: Ensures broad accessibility for learners
Software Environment
- Python 3.x
- PyTorch
- Other dependencies detailed in the setup directory
Accompanying Resources
Video Course
- 📹 17 hours and 15 minutes of complete video lessons
- Chapter-by-chapter coding demonstrations
- Can be used as a standalone learning resource or alongside the book
- Manning platform: Master and Build Large Language Models
Follow-up Book
"Build A Reasoning Model (From Scratch)"
- Can be considered a sequel
- Starts with pre-trained models
- Implements different reasoning methods:
- Reasoning time extension
- Reinforcement learning
- Distillation techniques
- Improves model reasoning capabilities
- GitHub repository: reasoning-from-scratch
Testing Resources
Free 170-page PDF: "Test Yourself On Build a Large Language Model (From Scratch)"
- Approximately 30 quiz questions and answers per chapter
- Helps test understanding
- Free download from Manning website
Exercise Solutions
- Each chapter includes multiple exercises
- Solutions are summarized in Appendix C
- Corresponding code notebooks are in each chapter's folder
- Example:
./ch02/01_main-chapter-code/exercise-solutions.ipynb
- Example:
Project Access
Download Methods
Method 1: Direct ZIP Download
# Click the Download ZIP button on the GitHub page
Method 2: Git Clone
git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git
Code Organization
- Each chapter has its own folder:
ch02/,ch03/, etc. - Main code is in the
01_main-chapter-code/subfolder - Additional resources are in their corresponding numbered folders
Suggested Learning Path
Mental Model
The book provides a clear mental map summarizing all covered content:
- Understanding LLM fundamentals
- Text data processing
- Attention mechanisms
- GPT architecture implementation
- Pretraining techniques
- Finetuning methods
- Practical application deployment
Learning Recommendations
- Beginners: Start from Chapter 1, follow the sequence, and complete exercises for each chapter.
- Experienced Learners: Can skip familiar chapters and focus on specific topics.
- Practitioners: Use bonus materials to explore advanced topics.
- Researchers: Refer to the citation format for referencing this resource in your research.
Community and Support
Feedback Channels
- 💬 Manning Forum: Official Forum
- 💭 GitHub Discussions: Discussion Area
- 🤝 All forms of feedback, questions, and idea exchanges are welcome.
Contribution Guidelines
- Due to the corresponding print book, the main chapter code maintains consistency.
- Contributions that extend the main chapter content are not currently accepted.
- This ensures consistency with the physical book content and provides a smooth learning experience.
Citation Information
Chicago Format
Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.
BibTeX Format
@book{build-llms-from-scratch-book,
author = {Sebastian Raschka},
title = {Build A Large Language Model (From Scratch)},
publisher = {Manning},
year = {2024},
isbn = {978-1633437166},
url = {https://www.manning.com/books/build-a-large-language-model-from-scratch},
github = {https://github.com/rasbt/LLMs-from-scratch}
}
Key Links
- 📖 GitHub Repository: https://github.com/rasbt/LLMs-from-scratch
- 🛒 Manning Publisher: http://mng.bz/orYv
- 🛒 Amazon Purchase: https://www.amazon.com/gp/product/1633437167
- 📹 Video Course: https://www.manning.com/livevideo/master-and-build-large-language-models
- 🧠 Reasoning Model Follow-up: https://github.com/rasbt/reasoning-from-scratch
- 📄 Setup Documentation: setup/README.md
Summary
This is a comprehensive and systematic LLM learning resource, suitable for:
- 🎓 Learners who wish to deeply understand the working principles of LLMs
- 👨💻 Developers who want to practically implement GPT-like models
- 🔬 Researchers engaged in NLP and deep learning research
- 🚀 Tech enthusiasts interested in AI and machine learning
Through this project, you will gain the complete ability to build, train, and finetune large language models from scratch.