Stage 4: Deep Learning and Neural Networks

A complete tutorial on building large language models from scratch, implementing the GPT architecture step-by-step with PyTorch, including the entire process of pre-training, fine-tuning, and deployment.

LLMTransformerPyTorchGitHubTextFreeEnglish

LLMs-from-scratch: Detailed Course Description

Project Overview

LLMs-from-scratch is a comprehensive learning resource created by Sebastian Raschka, designed to teach how to build large language models (LLMs) from scratch. This project serves as the official code repository for the book "Build a Large Language Model (From Scratch)."

Core Features

📚 Learning Objectives

Understand the inner workings of large language models
Gradually build your own LLM through coding
Learn the training and development methods behind foundational models like ChatGPT
Master techniques for loading and finetuning pre-trained model weights

🎯 Teaching Methodology

Code from Scratch: Implement using PyTorch from the ground up, without relying on external LLM libraries
Step-by-Step: Each stage is clearly explained with text, diagrams, and examples
Highly Practical: Create small yet fully functional educational models
Rich Supplementary Resources: Includes 17 hours and 15 minutes of video lessons

Course Structure

Chapter Content

Chapter 1: Understanding large language models

Introduction to LLM fundamental concepts
Model architecture overview

Chapter 2: Working with text data

Main code: ch02.ipynb, dataloader.ipynb
Text preprocessing and data loading
Exercise solutions: exercise-solutions.ipynb

Chapter 3: Coding attention mechanisms

Main code: ch03.ipynb, multihead-attention.ipynb
Implementation of self-attention mechanisms
Detailed explanation of multi-head attention

Chapter 4: Implementing a GPT model from scratch

Main code: ch04.ipynb, gpt.py
Complete GPT architecture implementation
Detailed explanation of model components

Chapter 5: Pretraining on unlabeled data

Main code: ch05.ipynb, gpt_train.py, gpt_generate.py
Pretraining workflow
Text generation implementation

Chapter 6: Finetuning for classification

Main code: ch06.ipynb, gpt_class_finetune.py
Adapting models for specific classification tasks
Finetuning techniques and strategies

Chapter 7: Finetuning to follow instructions

Main code: ch07.ipynb, gpt_instruction_finetuning.py
Instruction finetuning methods
Model evaluation: ollama_evaluate.py

Appendix Content

Appendix A: Introduction to PyTorch

Code: code-part1.ipynb, code-part2.ipynb
Distributed Data Parallel (DDP) training: DDP-script.py
Quick start to PyTorch fundamentals

Appendices B-E

Appendix B: References and further reading
Appendix C: Summary of exercise solutions
Appendix D: Adding additional features
Appendix E: Parameter-efficient finetuning

Bonus Materials

Chapter 5 Additional Resources

Alternative Weight Loading Methods: Different techniques for loading model weights
Pretraining on Project Gutenberg Dataset: Training on a large text corpus
Training Loop Optimizations: Adding various improvements
Learning Rate Schedulers: Optimizing the training process
Hyperparameter Tuning: Pretraining hyperparameter optimization
Building a User Interface: UI for interacting with a pre-trained LLM
Model Conversion:
- GPT to Llama conversion
- Llama 3.2 from scratch implementation
- Qwen3 Dense and Mixture-of-Experts (MoE) models
- Gemma 3 from scratch implementation
Memory-Efficient Weight Loading: Optimizing model loading
Tiktoken BPE Tokenizer Extension: Adding new tokens
PyTorch Performance Optimization Tips: Accelerating LLM training

Chapter 6 Additional Resources

Advanced techniques for classification finetuning

Chapter 7 Additional Resources

Dataset Tools: Finding approximate duplicates and creating passive voice entries
Response Evaluation: Evaluating instruction responses using OpenAI API and Ollama
Dataset Generation: Generating datasets for instruction finetuning
Dataset Improvement: Enhancing the quality of instruction finetuning datasets
Preference Dataset Generation: Using Llama 3.1 70B and Ollama
DPO Alignment: Direct Preference Optimization implementation
User Interface: Interacting with instruction-finetuned GPT models

Inference Model Resources (from reasoning-from-scratch repository)

Qwen3 base
Model evaluation methods

Technical Requirements

Prerequisites

✅ Required: Strong foundation in Python programming
✅ Helpful: Basic knowledge of deep neural networks
✅ Helpful: PyTorch basics (Appendix A provides a quick start)

Hardware Requirements

💻 Standard Laptop Sufficient: Main chapter code designed to run on a regular laptop
🚀 Automatic GPU Acceleration: Code automatically uses GPU if available
⚡ No Specialized Hardware Needed: Ensures broad accessibility for learners

Software Environment

Python 3.x
PyTorch
Other dependencies detailed in the setup directory

Accompanying Resources

Video Course

📹 17 hours and 15 minutes of complete video lessons
Chapter-by-chapter coding demonstrations
Can be used as a standalone learning resource or alongside the book
Manning platform: Master and Build Large Language Models

Follow-up Book

"Build A Reasoning Model (From Scratch)"

Can be considered a sequel
Starts with pre-trained models
Implements different reasoning methods:
- Reasoning time extension
- Reinforcement learning
- Distillation techniques
Improves model reasoning capabilities
GitHub repository: reasoning-from-scratch

Testing Resources

Free 170-page PDF: "Test Yourself On Build a Large Language Model (From Scratch)"

Approximately 30 quiz questions and answers per chapter
Helps test understanding
Free download from Manning website

Exercise Solutions

Each chapter includes multiple exercises
Solutions are summarized in Appendix C
Corresponding code notebooks are in each chapter's folder
- Example: ./ch02/01_main-chapter-code/exercise-solutions.ipynb

Project Access

Download Methods

Method 1: Direct ZIP Download

# Click the Download ZIP button on the GitHub page

Method 2: Git Clone

git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git

Code Organization

Each chapter has its own folder: ch02/, ch03/, etc.
Main code is in the 01_main-chapter-code/ subfolder
Additional resources are in their corresponding numbered folders

Suggested Learning Path

Mental Model

The book provides a clear mental map summarizing all covered content:

Understanding LLM fundamentals
Text data processing
Attention mechanisms
GPT architecture implementation
Pretraining techniques
Finetuning methods
Practical application deployment

Learning Recommendations

Beginners: Start from Chapter 1, follow the sequence, and complete exercises for each chapter.
Experienced Learners: Can skip familiar chapters and focus on specific topics.
Practitioners: Use bonus materials to explore advanced topics.
Researchers: Refer to the citation format for referencing this resource in your research.

Community and Support

Feedback Channels

💬 Manning Forum: Official Forum
💭 GitHub Discussions: Discussion Area
🤝 All forms of feedback, questions, and idea exchanges are welcome.

Contribution Guidelines

Due to the corresponding print book, the main chapter code maintains consistency.
Contributions that extend the main chapter content are not currently accepted.
This ensures consistency with the physical book content and provides a smooth learning experience.

Citation Information

Chicago Format

Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.

BibTeX Format

@book{build-llms-from-scratch-book,
  author = {Sebastian Raschka},
  title = {Build A Large Language Model (From Scratch)},
  publisher = {Manning},
  year = {2024},
  isbn = {978-1633437166},
  url = {https://www.manning.com/books/build-a-large-language-model-from-scratch},
  github = {https://github.com/rasbt/LLMs-from-scratch}
}

Key Links

📖 GitHub Repository: https://github.com/rasbt/LLMs-from-scratch
🛒 Manning Publisher: http://mng.bz/orYv
🛒 Amazon Purchase: https://www.amazon.com/gp/product/1633437167
📹 Video Course: https://www.manning.com/livevideo/master-and-build-large-language-models
🧠 Reasoning Model Follow-up: https://github.com/rasbt/reasoning-from-scratch
📄 Setup Documentation: setup/README.md

Summary

This is a comprehensive and systematic LLM learning resource, suitable for:

🎓 Learners who wish to deeply understand the working principles of LLMs
👨‍💻 Developers who want to practically implement GPT-like models
🔬 Researchers engaged in NLP and deep learning research
🚀 Tech enthusiasts interested in AI and machine learning

Through this project, you will gain the complete ability to build, train, and finetune large language models from scratch.