huggingface/diffusersPlease refer to the latest official releases for information GitHub Homepage

先進的擴散模型庫，支援圖像、影片和音訊生成

Apache-2.0Python 29.4khuggingfacediffusers Last Updated: 2025-06-23

🤗 Diffusers 項目詳細介紹

項目概述

🤗 Diffusers 是 Hugging Face 開發的最先進的擴散模型庫，專門用於圖像、音訊甚至分子3D結構的生成。無論你是尋找簡單的推理解決方案還是訓練自己的擴散模型，🤗 Diffusers 都是一個支持兩者的模組化工具箱。

項目地址： https://github.com/huggingface/diffusers

核心特性

設計理念

實用性優於性能 (usability over performance)
簡單優於容易 (simple over easy)
可客製化性優於抽象 (customizability over abstractions)

三大核心組件

最先進的擴散管道 (Diffusion Pipelines)
- 僅需幾行代碼即可運行推理
- 支持多種生成任務
可互換的噪聲調度器 (Noise Schedulers)
- 支持不同的擴散速度
- 可調節輸出質量
預訓練模型 (Pretrained Models)
- 可作為構建塊使用
- 與調度器結合創建端到端擴散系統

安裝方法

PyTorch 版本

# 官方包
pip install --upgrade diffusers[torch]

# 社區維護的 conda 版本
conda install -c conda-forge diffusers

Flax 版本

pip install --upgrade diffusers[flax]

快速開始

文本到圖像生成

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
pipeline("An image of a squirrel in Picasso style").images[0]

自定義擴散系統

from diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
import torch

scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
scheduler.set_timesteps(50)

sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size), device="cuda")
input = noise

for t in scheduler.timesteps:
    with torch.no_grad():
        noisy_residual = model(input, t).sample
    prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
    input = prev_noisy_sample

image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image

支持的主要任務和模型

任務	管道	推薦模型
無條件圖像生成	DDPMPipeline	google/ddpm-ema-church-256
文本到圖像	StableDiffusionPipeline	stable-diffusion-v1-5/stable-diffusion-v1-5
文本到圖像 (unCLIP)	UnCLIPPipeline	kakaobrain/karlo-v1-alpha
文本到圖像 (DeepFloyd IF)	IFPipeline	DeepFloyd/IF-I-XL-v1.0
文本到圖像 (Kandinsky)	KandinskyPipeline	kandinsky-community/kandinsky-2-2-decoder
可控生成	StableDiffusionControlNetPipeline	lllyasviel/sd-controlnet-canny
圖像編輯	StableDiffusionInstructPix2PixPipeline	timbrooks/instruct-pix2pix
圖像到圖像	StableDiffusionImg2ImgPipeline	stable-diffusion-v1-5/stable-diffusion-v1-5
圖像修復	StableDiffusionInpaintPipeline	runwayml/stable-diffusion-inpainting
圖像變體	StableDiffusionImageVariationPipeline	lambdalabs/sd-image-variations-diffusers
圖像超分辨率	StableDiffusionUpscalePipeline	stabilityai/stable-diffusion-x4-upscaler
潛在空間超分	StableDiffusionLatentUpscalePipeline	stabilityai/sd-x2-latent-upscaler

文檔結構

文檔類型	學習內容
Tutorial	學習庫的基本技能，如使用模型和調度器構建擴散系統，訓練自己的擴散模型
Loading	如何加載和配置庫的所有組件（管道、模型和調度器），以及如何使用不同的調度器
Pipelines for inference	如何使用管道進行不同的推理任務、批量生成、控制生成輸出和隨機性
Optimization	如何優化管道以在記憶體受限的硬體上運行，並加速推理
Training	如何訓練自己的擴散模型以進行不同任務

社區生態

集成項目

Microsoft TaskMatrix
InvokeAI
InstantID
Apple ML Stable Diffusion
Lama Cleaner
Grounded Segment Anything
Stable DreamFusion
DeepFloyd IF
BentoML
Kohya_ss

總結

🤗 Diffusers 是目前最完整、最易用的擴散模型庫之一。它不僅提供了豐富的預訓練模型和管道，還支持自定義訓練和優化。無論是AI研究者、開發者還是創作者，都能在這個庫中找到所需的工具來實現各種生成式AI應用。