🤗 Optimum is a specialized machine learning model optimization library launched by Hugging Face, serving as an extension tool for 🤗 Transformers and Diffusers. The project focuses on providing maximum efficiency model training and inference optimization tools for various target hardware, while maintaining ease of use.
Project Address: https://github.com/huggingface/optimum
Optimum supports various mainstream hardware acceleration platforms:
Provides optimized training wrappers, supporting:
python -m pip install optimum
Choose the corresponding installation command based on the required hardware platform:
# ONNX Runtime
pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]
# ExecuTorch
pip install --upgrade --upgrade-strategy eager optimum[executorch]
# Intel Neural Compressor
pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
# OpenVINO
pip install --upgrade --upgrade-strategy eager optimum[openvino]
# NVIDIA TensorRT-LLM
docker run -it --gpus all --ipc host huggingface/optimum-nvidia
# AMD Hardware
pip install --upgrade --upgrade-strategy eager optimum[amd]
# AWS Trainium & Inferentia
pip install --upgrade --upgrade-strategy eager optimum[neuronx]
# Habana Gaudi
pip install --upgrade --upgrade-strategy eager optimum[habana]
# FuriosaAI
pip install --upgrade --upgrade-strategy eager optimum[furiosa]
python -m pip install git+https://github.com/huggingface/optimum.git
ONNX Export Example:
# Install dependencies
pip install optimum[exporters,onnxruntime]
# Export model
optimum-cli export onnx --model bert-base-uncased --output ./bert-onnx/
ExecuTorch Export:
# Install dependencies
pip install optimum[exporters-executorch]
# Export model for edge devices
optimum-cli export executorch --model distilbert-base-uncased --output ./distilbert-executorch/
TensorFlow Lite Export:
# Install dependencies
pip install optimum[exporters-tf]
# Export and quantize
optimum-cli export tflite --model bert-base-uncased --output ./bert-tflite/
Using ONNX Runtime for optimized inference:
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
# Load the optimized model
model = ORTModelForSequenceClassification.from_pretrained("./bert-onnx/")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Perform inference
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model(**inputs)
Supports various quantization schemes:
Using Habana Gaudi for optimized training:
from optimum.habana import GaudiTrainer, GaudiTrainingArguments
# Configure training parameters
training_args = GaudiTrainingArguments(
output_dir="./results",
use_habana=True,
use_lazy_mode=True,
gaudi_config_name="Habana/bert-base-uncased"
)
# Create optimized trainer
trainer = GaudiTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
# Start training
trainer.train()
optimum-cli
command-line tool to simplify operationsHugging Face Optimum is a powerful and easy-to-use machine learning model optimization toolkit. It provides developers with a complete solution for efficiently deploying AI models to various hardware platforms, making it an important tool for modern AI application development and deployment. Whether it's edge device deployment or large-scale cloud services, Optimum can provide significant performance improvements and cost optimization.