Home
Login

Stage 6: AI Project Practice and Production Deployment

A comprehensive machine learning engineering course that teaches how to integrate machine learning with software engineering, covering the entire process from experimentation to production deployment.

MLOpsMachineLearningProductionMLGitHubTextFreeEnglish

Made With ML Project Details

Project Overview

Made With ML is an open-source project created by Goku Mohandas, focused on teaching how to combine machine learning with software engineering to design, develop, deploy, and iterate production-grade machine learning applications. The project has become one of the top machine learning repositories on GitHub, with over 40,000 developers following it.

Project Goals and Features

Core Philosophy

The course iteratively builds reliable production systems, progressing from the experimentation phase (design + development) to the production phase (deployment + iteration).

Key Features

  1. 💡 First Principles: Establish a first-principles understanding of each machine learning concept before diving into the code.
  2. 💻 Best Practices: Implement software engineering best practices when developing and deploying machine learning models.
  3. 📈 Scaling: Easily scale machine learning workloads (data, training, tuning, serving) in Python without learning a completely new language.
  4. ⚙️ MLOps: Connect MLOps components (tracking, testing, serving, orchestration, etc.) to build end-to-end machine learning systems.
  5. 🚀 Development to Production: Learn how to move from development to production quickly and reliably without changing code or infrastructure management.
  6. 🐙 CI/CD: Learn how to create mature CI/CD workflows to continuously train and deploy better models in a modular way.

Target Audience

The project is aimed at various types of learners:

  • 👩💻 All Developers: Whether software/infrastructure engineers or data scientists, machine learning is increasingly becoming a critical part of product development.
  • 👩🎓 University Graduates: Learn practical skills needed in the industry, bridging the gap between university courses and industry expectations.
  • 👩💼 Product/Leadership: Hoping to build a technical foundation to build amazing and reliable products powered by machine learning.

Project Structure and Content

Code Structure

The core code of the project is refactored into the following Python scripts:

madewithml
├── config.py
├── data.py
├── evaluate.py
├── models.py
├── predict.py
├── serve.py
├── train.py
├── tune.py
└── utils.py

Main Workflow

1. Environment Setup

The project supports multiple deployment environments:

  • Local Environment: Use a personal laptop as a cluster
  • Anyscale Platform: Use Anyscale Workspace for cloud development
  • Other Platforms: Support AWS, GCP, Kubernetes, local deployment, etc.

2. Data and Model Training

export EXPERIMENT_NAME="llm"
export DATASET_LOC="https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/dataset.csv"
export TRAIN_LOOP_CONFIG='{"dropout_p": 0.5, "lr": 1e-4, "lr_factor": 0.8, "lr_patience": 3}'
python madewithml/train.py \
--experiment-name "$EXPERIMENT_NAME" \
--dataset-loc "$DATASET_LOC" \
--train-loop-config "$TRAIN_LOOP_CONFIG" \
--num-workers 1 \
--cpu-per-worker 3 \
--gpu-per-worker 1 \
--num-epochs 10 \
--batch-size 256 \
--results-fp results/training_results.json

3. Model Tuning

python madewithml/tune.py \
--experiment-name "$EXPERIMENT_NAME" \
--dataset-loc "$DATASET_LOC" \
--initial-params "$INITIAL_PARAMS" \
--num-runs 2 \
--num-workers 1 \
--cpu-per-worker 3 \
--gpu-per-worker 1 \
--num-epochs 10 \
--batch-size 256 \
--results-fp results/tuning_results.json

4. Model Evaluation

export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)
python madewithml/evaluate.py \
--run-id $RUN_ID \
--dataset-loc $HOLDOUT_LOC \
--results-fp results/evaluation_results.json

5. Model Prediction

python madewithml/predict.py predict \
--run-id $RUN_ID \
--title "Transfer learning with transformers" \
--description "Using transformers for transfer learning on text classification tasks."

6. Model Serving

python madewithml/serve.py --run_id $RUN_ID

Experiment Tracking

The project uses MLflow for experiment tracking and model management:

export MODEL_REGISTRY=$(python -c "from madewithml import config; print(config.MODEL_REGISTRY)")
mlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY

Testing Framework

The project includes a comprehensive test suite:

# Code testing
python3 -m pytest tests/code --verbose --disable-warnings

# Data testing
pytest --dataset-loc=$DATASET_LOC tests/data --verbose --disable-warnings

# Model testing
pytest --run-id=$RUN_ID tests/model --verbose --disable-warnings

# Coverage testing
python3 -m pytest tests/code --cov madewithml --cov-report html --disable-warnings

Production Deployment

Anyscale Deployment

The project provides a complete Anyscale deployment solution:

  1. Cluster Environment Configuration:
export CLUSTER_ENV_NAME="madewithml-cluster-env"
anyscale cluster-env build deploy/cluster_env.yaml --name $CLUSTER_ENV_NAME
  1. Compute Configuration:
export CLUSTER_COMPUTE_NAME="madewithml-cluster-compute-g5.4xlarge"
anyscale cluster-compute create deploy/cluster_compute.yaml --name $CLUSTER_COMPUTE_NAME
  1. Job Submission:
anyscale job submit deploy/jobs/workloads.yaml
  1. Service Deployment:
anyscale service rollout -f deploy/services/serve_model.yaml

CI/CD Process

The project integrates GitHub Actions to implement automated deployment:

  1. Workflow Trigger: Trigger workload workflow when creating a PR
  2. Model Training and Evaluation: Automatically execute training and evaluation
  3. Result Feedback: Directly display training and evaluation results in the PR
  4. Automatic Deployment: Automatically deploy to the production environment after merging to the main branch

Core Learning Points

Tech Stack

  • Python: Core programming language
  • Ray: Distributed computing framework
  • MLflow: Experiment tracking and model management
  • Transformers: Deep learning models
  • FastAPI: API service framework
  • pytest: Testing framework
  • GitHub Actions: CI/CD platform

Machine Learning Engineering Best Practices

  1. Code Organization: Modular project structure
  2. Experiment Management: Systematic experiment tracking
  3. Version Control: Version management of code and models
  4. Testing Strategy: Comprehensive test coverage
  5. Deployment Automation: CI/CD process integration
  6. Monitoring and Maintenance: Continuous monitoring of the production environment

Project Value

Machine learning is not a separate industry but a powerful data mindset, not limited to any specific type of person. This project provides a complete learning path, from basic concepts to production deployment, helping learners master the full set of modern machine learning engineering skills.

Learning Resources

Continuous Improvement

The project emphasizes the importance of continuous improvement. With the establishment of CI/CD workflows, you can focus on continuously improving models and easily scale to scheduled runs (cron), data pipelines, drift detection monitoring, online evaluation, etc.

This project provides machine learning practitioners with a comprehensive and practical learning resource, covering the entire process from conceptual understanding to production deployment, and is an excellent resource for learning modern MLOps.