Stage 2: Classic Machine Learning

An educational project that implements fundamental machine learning models and algorithms from scratch in Python, covering various algorithms from linear regression to deep learning, with a focus on transparently demonstrating the internal workings of the algorithms.

MachineLearningPythonNumPyGitHubTextFreeEnglish

ML-From-Scratch Project Detailed Introduction

Project Overview

ML-From-Scratch is a project that implements fundamental machine learning models and algorithms from scratch using Python. The goal of this project is not to produce the most optimized and computationally efficient algorithms, but rather to demonstrate the internal workings of algorithms in a transparent and easy-to-understand manner.

Project Author

The project author, Erik Linder-Norén, is a Machine Learning Engineer at Apple, passionate about machine learning, basketball, and building things.

Project Features

1. Education-Oriented

Focuses on transparently demonstrating the internal working mechanisms of algorithms
Code implementation is clear and easy to understand, facilitating learning and comprehension
Provides rich examples and visualization results

2. Technology Stack

Main Language: Python
Core Dependency: NumPy (for numerical computation)
Visualization: Matplotlib (for plotting and display)

3. Extensive Coverage

Covers various important areas of machine learning, from linear regression to deep learning

Installation and Usage

Installation Steps

$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install

Running Examples

# Polynomial Regression Example
$ python mlfromscratch/examples/polynomial_regression.py

# Convolutional Neural Network Example
$ python mlfromscratch/examples/convolutional_neural_network.py

# DBSCAN Clustering Example
$ python mlfromscratch/examples/dbscan.py

Implemented Algorithm Categories

Supervised Learning Algorithms

Adaboost: Adaptive Boosting Algorithm
Bayesian Regression
Decision Tree
Elastic Net: Elastic Net Regression
Gradient Boosting
K Nearest Neighbors: K-Nearest Neighbors Algorithm
Lasso Regression
Linear Discriminant Analysis
Linear Regression
Logistic Regression
Multi-class Linear Discriminant Analysis
Multilayer Perceptron
Naive Bayes
Neuroevolution
Particle Swarm Optimization of Neural Network
Perceptron
Polynomial Regression
Random Forest
Ridge Regression
Support Vector Machine
XGBoost: Extreme Gradient Boosting

Unsupervised Learning Algorithms

Apriori: Association Rule Mining
Autoencoder
DBSCAN: Density-Based Spatial Clustering of Applications with Noise
FP-Growth: Frequent Pattern Growth Algorithm
Gaussian Mixture Model
Generative Adversarial Network
Genetic Algorithm
K-Means: K-Means Clustering
Partitioning Around Medoids
Principal Component Analysis
Restricted Boltzmann Machine

Deep Learning Components

Neural Network: Neural Network Framework
Various Layer Types:
- Activation Layer
- Average Pooling Layer
- Batch Normalization Layer
- Constant Padding Layer
- Convolutional Layer
- Dropout Layer
- Flatten Layer
- Fully-Connected (Dense) Layer
- Fully-Connected RNN Layer
- Max Pooling Layer
- Reshape Layer
- Up Sampling Layer
- Zero Padding Layer

Example Run Results

1. Convolutional Neural Network Example

+---------+
| ConvNet |
+---------+
Input Shape: (1, 8, 8)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Conv2D              | 160        | (16, 8, 8)   |
| Activation (ReLU)   | 0          | (16, 8, 8)   |
| Dropout             | 0          | (16, 8, 8)   |
| BatchNormalization  | 2048       | (16, 8, 8)   |
| Conv2D              | 4640       | (32, 8, 8)   |
| Activation (ReLU)   | 0          | (32, 8, 8)   |
| Dropout             | 0          | (32, 8, 8)   |
| BatchNormalization  | 4096       | (32, 8, 8)   |
| Flatten             | 0          | (2048,)      |
| Dense               | 524544     | (256,)       |
| Activation (ReLU)   | 0          | (256,)       |
| Dropout             | 0          | (256,)       |
| BatchNormalization  | 512        | (256,)       |
| Dense               | 2570       | (10,)        |
| Activation (Softmax)| 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 538570

Training: 100% [------------------------------------------------------------------------] Time: 0:01:55
Accuracy: 0.987465181058

2. Deep Q-Network Example

+----------------+
| Deep Q-Network |
+----------------+
Input Shape: (4,)
+-------------------+------------+--------------+
| Layer Type        | Parameters | Output Shape |
+-------------------+------------+--------------+
| Dense             | 320        | (64,)        |
| Activation (ReLU) | 0          | (64,)        |
| Dense             | 130        | (2,)         |
+-------------------+------------+--------------+
Total Parameters: 450

3. Neuroevolution Example

+---------------+
| Model Summary |
+---------------+
Input Shape: (64,)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Dense                | 1040       | (16,)        |
| Activation (ReLU)    | 0          | (16,)        |
| Dense                | 170        | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 1210

Population Size: 100
Generations: 3000
Mutation Rate: 0.01
[0 Best Individual - Fitness: 3.08301, Accuracy: 10.5%]
[1 Best Individual - Fitness: 3.08746, Accuracy: 12.0%]
...
[2999 Best Individual - Fitness: 94.08513, Accuracy: 98.5%]
Test set accuracy: 96.7%

Learning Value

1. Deep Understanding of Algorithm Principles

By implementing from scratch, one can deeply understand the core ideas of each algorithm
Clear code structure helps in understanding the specific implementation details of algorithms

2. Practical Programming Skills

Improves skills in using scientific computing libraries like NumPy
Learn how to translate mathematical theories into code implementations

3. Foundation for Research and Development

Provides a foundation for further algorithm research and improvement
Helps in understanding the underlying implementations of existing machine learning frameworks

Target Audience

Machine learning beginners and intermediate learners
Researchers who wish to deeply understand the internal mechanisms of algorithms
Developers who want to improve their programming implementation skills
Students in machine learning-related majors

Project Advantages

Highly Educational: Clear and easy-to-understand code, emphasizing teaching value
Extensive Coverage: Includes everything from basic to advanced algorithms
Highly Practical: Provides complete, runnable examples
Continuously Updated: Active project, constantly adding new algorithm implementations

Notes

This project is primarily for teaching and learning purposes, not suitable for production environments
Algorithm implementations prioritize readability over performance optimization
Recommended to use in conjunction with theoretical study for better results