An advanced diffusion model library for image, video, and audio generation.
🤗 Diffusers Project Detailed Introduction
Project Overview
🤗 Diffusers is a state-of-the-art diffusion model library developed by Hugging Face, specializing in the generation of images, audio, and even molecular 3D structures. Whether you're looking for simple inference solutions or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both.
Project Address: https://github.com/huggingface/diffusers
Core Features
Design Philosophy
- Usability over performance
- Simple over easy
- Customizability over abstractions
Three Core Components
Diffusion Pipelines
- Run inference in just a few lines of code
- Supports various generation tasks
Noise Schedulers
- Supports different diffusion speeds
- Adjustable output quality
Pretrained Models
- Can be used as building blocks
- Combined with schedulers to create end-to-end diffusion systems
Installation Method
PyTorch Version
# Official package
pip install --upgrade diffusers[torch]
# Community-maintained conda version
conda install -c conda-forge diffusers
Flax Version
pip install --upgrade diffusers[flax]
Quick Start
Text-to-Image Generation
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
pipeline("An image of a squirrel in Picasso style").images[0]
Custom Diffusion System
from diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
import torch
scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
scheduler.set_timesteps(50)
sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size), device="cuda")
input = noise
for t in scheduler.timesteps:
with torch.no_grad():
noisy_residual = model(input, t).sample
prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
input = prev_noisy_sample
image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image
Supported Main Tasks and Models
Task | Pipeline | Recommended Model |
---|---|---|
Unconditional Image Generation | DDPMPipeline | google/ddpm-ema-church-256 |
Text-to-Image | StableDiffusionPipeline | stable-diffusion-v1-5/stable-diffusion-v1-5 |
Text-to-Image (unCLIP) | UnCLIPPipeline | kakaobrain/karlo-v1-alpha |
Text-to-Image (DeepFloyd IF) | IFPipeline | DeepFloyd/IF-I-XL-v1.0 |
Text-to-Image (Kandinsky) | KandinskyPipeline | kandinsky-community/kandinsky-2-2-decoder |
Controllable Generation | StableDiffusionControlNetPipeline | lllyasviel/sd-controlnet-canny |
Image Editing | StableDiffusionInstructPix2PixPipeline | timbrooks/instruct-pix2pix |
Image-to-Image | StableDiffusionImg2ImgPipeline | stable-diffusion-v1-5/stable-diffusion-v1-5 |
Image Inpainting | StableDiffusionInpaintPipeline | runwayml/stable-diffusion-inpainting |
Image Variation | StableDiffusionImageVariationPipeline | lambdalabs/sd-image-variations-diffusers |
Image Super-Resolution | StableDiffusionUpscalePipeline | stabilityai/stable-diffusion-x4-upscaler |
Latent Space Super-Resolution | StableDiffusionLatentUpscalePipeline | stabilityai/sd-x2-latent-upscaler |
Documentation Structure
Document Type | Learning Content |
---|---|
Tutorial | Learn basic skills of the library, such as using models and schedulers to build diffusion systems, and training your own diffusion models |
Loading | How to load and configure all components of the library (pipelines, models, and schedulers), and how to use different schedulers |
Pipelines for inference | How to use pipelines for different inference tasks, batch generation, controlling generation output, and randomness |
Optimization | How to optimize pipelines to run on memory-constrained hardware and accelerate inference |
Training | How to train your own diffusion models for different tasks |
Community Ecosystem
Integrated Projects
- Microsoft TaskMatrix
- InvokeAI
- InstantID
- Apple ML Stable Diffusion
- Lama Cleaner
- Grounded Segment Anything
- Stable DreamFusion
- DeepFloyd IF
- BentoML
- Kohya_ss
Summary
🤗 Diffusers is one of the most complete and easy-to-use diffusion model libraries available. It not only provides a wealth of pre-trained models and pipelines, but also supports custom training and optimization. Whether you are an AI researcher, developer, or creator, you can find the tools you need in this library to implement various generative AI applications.