huggingface/diffusersView GitHub Homepage for Latest Official Releases

先进的扩散模型库，支持图像、视频和音频生成

Apache-2.0Pythondiffusershuggingface 30.2k Last Updated: August 07, 2025

🤗 Diffusers 项目详细介绍

项目概述

🤗 Diffusers 是 Hugging Face 开发的最先进的扩散模型库，专门用于图像、音频甚至分子3D结构的生成。无论你是寻找简单的推理解决方案还是训练自己的扩散模型，🤗 Diffusers 都是一个支持两者的模块化工具箱。

项目地址： https://github.com/huggingface/diffusers

核心特性

设计理念

实用性优于性能 (usability over performance)
简单优于容易 (simple over easy)
可定制性优于抽象 (customizability over abstractions)

三大核心组件

最先进的扩散管道 (Diffusion Pipelines)
- 仅需几行代码即可运行推理
- 支持多种生成任务
可互换的噪声调度器 (Noise Schedulers)
- 支持不同的扩散速度
- 可调节输出质量
预训练模型 (Pretrained Models)
- 可作为构建块使用
- 与调度器结合创建端到端扩散系统

安装方法

PyTorch 版本

# 官方包
pip install --upgrade diffusers[torch]

# 社区维护的 conda 版本
conda install -c conda-forge diffusers

Flax 版本

pip install --upgrade diffusers[flax]

快速开始

文本到图像生成

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
pipeline("An image of a squirrel in Picasso style").images[0]

自定义扩散系统

from diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
import torch

scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
scheduler.set_timesteps(50)

sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size), device="cuda")
input = noise

for t in scheduler.timesteps:
    with torch.no_grad():
        noisy_residual = model(input, t).sample
    prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
    input = prev_noisy_sample

image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image

支持的主要任务和模型

任务	管道	推荐模型
无条件图像生成	DDPMPipeline	google/ddpm-ema-church-256
文本到图像	StableDiffusionPipeline	stable-diffusion-v1-5/stable-diffusion-v1-5
文本到图像 (unCLIP)	UnCLIPPipeline	kakaobrain/karlo-v1-alpha
文本到图像 (DeepFloyd IF)	IFPipeline	DeepFloyd/IF-I-XL-v1.0
文本到图像 (Kandinsky)	KandinskyPipeline	kandinsky-community/kandinsky-2-2-decoder
可控生成	StableDiffusionControlNetPipeline	lllyasviel/sd-controlnet-canny
图像编辑	StableDiffusionInstructPix2PixPipeline	timbrooks/instruct-pix2pix
图像到图像	StableDiffusionImg2ImgPipeline	stable-diffusion-v1-5/stable-diffusion-v1-5
图像修复	StableDiffusionInpaintPipeline	runwayml/stable-diffusion-inpainting
图像变体	StableDiffusionImageVariationPipeline	lambdalabs/sd-image-variations-diffusers
图像超分辨率	StableDiffusionUpscalePipeline	stabilityai/stable-diffusion-x4-upscaler
潜在空间超分	StableDiffusionLatentUpscalePipeline	stabilityai/sd-x2-latent-upscaler

文档结构

文档类型	学习内容
Tutorial	学习库的基本技能，如使用模型和调度器构建扩散系统，训练自己的扩散模型
Loading	如何加载和配置库的所有组件（管道、模型和调度器），以及如何使用不同的调度器
Pipelines for inference	如何使用管道进行不同的推理任务、批量生成、控制生成输出和随机性
Optimization	如何优化管道以在内存受限的硬件上运行，并加速推理
Training	如何训练自己的扩散模型以进行不同任务

社区生态

集成项目

Microsoft TaskMatrix
InvokeAI
InstantID
Apple ML Stable Diffusion
Lama Cleaner
Grounded Segment Anything
Stable DreamFusion
DeepFloyd IF
BentoML
Kohya_ss

总结

🤗 Diffusers 是目前最完整、最易用的扩散模型库之一。它不仅提供了丰富的预训练模型和管道，还支持自定义训练和优化。无论是AI研究者、开发者还是创作者，都能在这个库中找到所需的工具来实现各种生成式AI应用。