Stability-AI/stablediffusionView GitHub Homepage for Latest Official Releases

基於潛在擴散模型的高解析度文本到圖像生成模型

MITPythonstablediffusionStability-AI 41.8k Last Updated: June 25, 2025

Stable Diffusion 項目詳細介紹

項目概述

Stable Diffusion 是由 Stability AI 開發的開源文本到圖像生成模型，基於潛在擴散模型（Latent Diffusion Models）技術。該項目實現了高分辨率圖像合成，能夠根據文本描述生成高質量的圖像。

項目地址： https://github.com/Stability-AI/stablediffusion

核心技術特點

1. 潛在擴散模型架構

使用潛在空間進行擴散過程，相比直接在像素空間操作更加高效
採用U-Net架構作為去噪網絡
集成自注意力和交叉注意力機制

2. 文本編碼器

使用OpenCLIP ViT-H/14作為文本編碼器
支持複雜的文本條件控制
能夠理解詳細的文本描述並轉換為視覺內容

3. 多分辨率支持

Stable Diffusion 2.1-v: 768x768像素輸出
Stable Diffusion 2.1-base: 512x512像素輸出
支持不同分辨率的訓練和推理

主要版本歷史

Version 2.1 (2022年12月7日)

推出768x768分辨率的v模型和512x512分辨率的base模型
基於相同的參數數量和架構
在更寬鬆的NSFW過濾數據集上進行微調

Version 2.0 (2022年11月24日)

768x768分辨率的全新模型
使用OpenCLIP-ViT/H作為文本編碼器
從頭開始訓練，採用v-prediction方法

Stable UnCLIP 2.1 (2023年3月24日)

支持圖像變換和混合操作
基於SD2.1-768微調
提供兩個變體：Stable unCLIP-L和Stable unCLIP-H

核心功能

1. 文本到圖像生成

基礎的文本描述生成圖像功能：

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt <path/to/768model.ckpt/> --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

2. 圖像修復（Inpainting）

支持圖像的局部修復和編輯：

python scripts/gradio/inpainting.py configs/stable-diffusion/v2-inpainting-inference.yaml <path-to-checkpoint>

3. 深度條件圖像生成

基於深度信息進行結構保持的圖像生成：

python scripts/gradio/depth2img.py configs/stable-diffusion/v2-midas-inference.yaml <path-to-ckpt>

4. 圖像超分辨率

4倍超分辨率功能：

python scripts/gradio/superresolution.py configs/stable-diffusion/x4-upscaling.yaml <path-to-checkpoint>

5. 圖像到圖像轉換

經典的img2img功能：

python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8 --ckpt <path/to/model.ckpt>

安裝和環境配置

基礎環境

conda install pytorch==1.12.1 torchvision==0.13.1 -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

性能優化（推薦）

安裝xformers庫以提高GPU性能：

export CUDA_HOME=/usr/local/cuda-11.4
conda install -c nvidia/label/cuda-11.4.0 cuda-nvcc
conda install -c conda-forge gcc
conda install -c conda-forge gxx_linux-64==9.5.0

cd ..
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e .
cd ../stablediffusion

Intel CPU優化

針對Intel CPU的優化配置：

apt-get install numactl libjemalloc-dev
pip install intel-openmp
pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable

技術架構細節

模型組件

編碼器-解碼器架構: 使用降採樣因子為8的自編碼器
U-Net網絡: 865M參數的U-Net用於擴散過程
文本編碼器: OpenCLIP ViT-H/14處理文本輸入
採樣器: 支持DDIM、PLMS、DPMSolver等多種採樣方法

內存優化

自動啟用內存高效注意力機制
支持xformers加速
提供FP16精度選項以節省顯存

應用場景

1. 藝術創作

概念藝術設計
插畫生成
風格遷移

2. 內容生產

營銷素材製作
社交媒體內容
產品原型設計

3. 研究應用

計算機視覺研究
生成模型研究
多模態學習

倫理考慮和限制

數據偏見

模型反映訓練數據中的偏見和誤解
不建議直接用於商業服務而不添加額外安全機制

內容安全

內置不可見水印系統幫助識別AI生成內容
努力減少顯式色情內容，但仍需謹慎使用

使用限制

權重僅供研究使用
遵循CreativeML Open RAIL++-M許可證