Stage 4: Deep Learning and Neural Networks

A complete tutorial on building large language models from scratch, implementing the GPT architecture step-by-step with PyTorch, including the entire process of pre-training, fine-tuning, and deployment.

LLMTransformerPyTorchGitHubTextFreeEnglish

LLMs-from-scratch: Detailed Course Description

Project Overview

LLMs-from-scratch is a comprehensive learning resource created by Sebastian Raschka, designed to teach how to build large language models (LLMs) from scratch. This project serves as the official code repository for the book "Build a Large Language Model (From Scratch)."

Core Features

📚 Learning Objectives

  • Understand the inner workings of large language models
  • Gradually build your own LLM through coding
  • Learn the training and development methods behind foundational models like ChatGPT
  • Master techniques for loading and finetuning pre-trained model weights

🎯 Teaching Methodology

  • Code from Scratch: Implement using PyTorch from the ground up, without relying on external LLM libraries
  • Step-by-Step: Each stage is clearly explained with text, diagrams, and examples
  • Highly Practical: Create small yet fully functional educational models
  • Rich Supplementary Resources: Includes 17 hours and 15 minutes of video lessons

Course Structure

Chapter Content

Chapter 1: Understanding large language models

  • Introduction to LLM fundamental concepts
  • Model architecture overview

Chapter 2: Working with text data

  • Main code: ch02.ipynb, dataloader.ipynb
  • Text preprocessing and data loading
  • Exercise solutions: exercise-solutions.ipynb

Chapter 3: Coding attention mechanisms

  • Main code: ch03.ipynb, multihead-attention.ipynb
  • Implementation of self-attention mechanisms
  • Detailed explanation of multi-head attention

Chapter 4: Implementing a GPT model from scratch

  • Main code: ch04.ipynb, gpt.py
  • Complete GPT architecture implementation
  • Detailed explanation of model components

Chapter 5: Pretraining on unlabeled data

  • Main code: ch05.ipynb, gpt_train.py, gpt_generate.py
  • Pretraining workflow
  • Text generation implementation

Chapter 6: Finetuning for classification

  • Main code: ch06.ipynb, gpt_class_finetune.py
  • Adapting models for specific classification tasks
  • Finetuning techniques and strategies

Chapter 7: Finetuning to follow instructions

  • Main code: ch07.ipynb, gpt_instruction_finetuning.py
  • Instruction finetuning methods
  • Model evaluation: ollama_evaluate.py

Appendix Content

Appendix A: Introduction to PyTorch

  • Code: code-part1.ipynb, code-part2.ipynb
  • Distributed Data Parallel (DDP) training: DDP-script.py
  • Quick start to PyTorch fundamentals

Appendices B-E

  • Appendix B: References and further reading
  • Appendix C: Summary of exercise solutions
  • Appendix D: Adding additional features
  • Appendix E: Parameter-efficient finetuning

Bonus Materials

Chapter 5 Additional Resources

  • Alternative Weight Loading Methods: Different techniques for loading model weights
  • Pretraining on Project Gutenberg Dataset: Training on a large text corpus
  • Training Loop Optimizations: Adding various improvements
  • Learning Rate Schedulers: Optimizing the training process
  • Hyperparameter Tuning: Pretraining hyperparameter optimization
  • Building a User Interface: UI for interacting with a pre-trained LLM
  • Model Conversion:
    • GPT to Llama conversion
    • Llama 3.2 from scratch implementation
    • Qwen3 Dense and Mixture-of-Experts (MoE) models
    • Gemma 3 from scratch implementation
  • Memory-Efficient Weight Loading: Optimizing model loading
  • Tiktoken BPE Tokenizer Extension: Adding new tokens
  • PyTorch Performance Optimization Tips: Accelerating LLM training

Chapter 6 Additional Resources

  • Advanced techniques for classification finetuning

Chapter 7 Additional Resources

  • Dataset Tools: Finding approximate duplicates and creating passive voice entries
  • Response Evaluation: Evaluating instruction responses using OpenAI API and Ollama
  • Dataset Generation: Generating datasets for instruction finetuning
  • Dataset Improvement: Enhancing the quality of instruction finetuning datasets
  • Preference Dataset Generation: Using Llama 3.1 70B and Ollama
  • DPO Alignment: Direct Preference Optimization implementation
  • User Interface: Interacting with instruction-finetuned GPT models

Inference Model Resources (from reasoning-from-scratch repository)

  • Qwen3 base
  • Model evaluation methods

Technical Requirements

Prerequisites

  • Required: Strong foundation in Python programming
  • Helpful: Basic knowledge of deep neural networks
  • Helpful: PyTorch basics (Appendix A provides a quick start)

Hardware Requirements

  • 💻 Standard Laptop Sufficient: Main chapter code designed to run on a regular laptop
  • 🚀 Automatic GPU Acceleration: Code automatically uses GPU if available
  • No Specialized Hardware Needed: Ensures broad accessibility for learners

Software Environment

  • Python 3.x
  • PyTorch
  • Other dependencies detailed in the setup directory

Accompanying Resources

Video Course

  • 📹 17 hours and 15 minutes of complete video lessons
  • Chapter-by-chapter coding demonstrations
  • Can be used as a standalone learning resource or alongside the book
  • Manning platform: Master and Build Large Language Models

Follow-up Book

"Build A Reasoning Model (From Scratch)"

  • Can be considered a sequel
  • Starts with pre-trained models
  • Implements different reasoning methods:
    • Reasoning time extension
    • Reinforcement learning
    • Distillation techniques
  • Improves model reasoning capabilities
  • GitHub repository: reasoning-from-scratch

Testing Resources

Free 170-page PDF: "Test Yourself On Build a Large Language Model (From Scratch)"

  • Approximately 30 quiz questions and answers per chapter
  • Helps test understanding
  • Free download from Manning website

Exercise Solutions

  • Each chapter includes multiple exercises
  • Solutions are summarized in Appendix C
  • Corresponding code notebooks are in each chapter's folder
    • Example: ./ch02/01_main-chapter-code/exercise-solutions.ipynb

Project Access

Download Methods

Method 1: Direct ZIP Download

# Click the Download ZIP button on the GitHub page

Method 2: Git Clone

git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git

Code Organization

  • Each chapter has its own folder: ch02/, ch03/, etc.
  • Main code is in the 01_main-chapter-code/ subfolder
  • Additional resources are in their corresponding numbered folders

Suggested Learning Path

Mental Model

The book provides a clear mental map summarizing all covered content:

  1. Understanding LLM fundamentals
  2. Text data processing
  3. Attention mechanisms
  4. GPT architecture implementation
  5. Pretraining techniques
  6. Finetuning methods
  7. Practical application deployment

Learning Recommendations

  1. Beginners: Start from Chapter 1, follow the sequence, and complete exercises for each chapter.
  2. Experienced Learners: Can skip familiar chapters and focus on specific topics.
  3. Practitioners: Use bonus materials to explore advanced topics.
  4. Researchers: Refer to the citation format for referencing this resource in your research.

Community and Support

Feedback Channels

  • 💬 Manning Forum: Official Forum
  • 💭 GitHub Discussions: Discussion Area
  • 🤝 All forms of feedback, questions, and idea exchanges are welcome.

Contribution Guidelines

  • Due to the corresponding print book, the main chapter code maintains consistency.
  • Contributions that extend the main chapter content are not currently accepted.
  • This ensures consistency with the physical book content and provides a smooth learning experience.

Citation Information

Chicago Format

Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.

BibTeX Format

@book{build-llms-from-scratch-book,
  author = {Sebastian Raschka},
  title = {Build A Large Language Model (From Scratch)},
  publisher = {Manning},
  year = {2024},
  isbn = {978-1633437166},
  url = {https://www.manning.com/books/build-a-large-language-model-from-scratch},
  github = {https://github.com/rasbt/LLMs-from-scratch}
}

Key Links

Summary

This is a comprehensive and systematic LLM learning resource, suitable for:

  • 🎓 Learners who wish to deeply understand the working principles of LLMs
  • 👨‍💻 Developers who want to practically implement GPT-like models
  • 🔬 Researchers engaged in NLP and deep learning research
  • 🚀 Tech enthusiasts interested in AI and machine learning

Through this project, you will gain the complete ability to build, train, and finetune large language models from scratch.