Stability-AI/stablediffusionPlease refer to the latest official releases for information GitHub Homepage

잠재 확산 모델을 기반으로 한 고해상도 텍스트-이미지 생성 모델

MITPython 41.2kStability-AIstablediffusion Last Updated: 2024-10-10

Stable Diffusion 프로젝트 상세 소개

프로젝트 개요

Stable Diffusion은 Stability AI에서 개발한 오픈 소스 텍스트-이미지 생성 모델로, 잠재 확산 모델(Latent Diffusion Models) 기술을 기반으로 합니다. 이 프로젝트는 고해상도 이미지 합성을 구현하여 텍스트 설명에 따라 고품질 이미지를 생성할 수 있습니다.

프로젝트 주소: https://github.com/Stability-AI/stablediffusion

핵심 기술 특징

1. 잠재 확산 모델 아키텍처

잠재 공간을 사용하여 확산 과정을 수행하여 픽셀 공간에서 직접 작업하는 것보다 효율적입니다.
U-Net 아키텍처를 노이즈 제거 네트워크로 사용합니다.
자체 주의(Self-Attention) 및 교차 주의(Cross-Attention) 메커니즘을 통합합니다.

2. 텍스트 인코더

OpenCLIP ViT-H/14를 텍스트 인코더로 사용합니다.
복잡한 텍스트 조건 제어를 지원합니다.
자세한 텍스트 설명을 이해하고 시각적 콘텐츠로 변환할 수 있습니다.

3. 다중 해상도 지원

Stable Diffusion 2.1-v: 768x768 픽셀 출력
Stable Diffusion 2.1-base: 512x512 픽셀 출력
다양한 해상도의 학습 및 추론을 지원합니다.

주요 버전 역사

Version 2.1 (2022년 12월 7일)

768x768 해상도의 v 모델과 512x512 해상도의 base 모델 출시
동일한 매개변수 수와 아키텍처 기반
더 완화된 NSFW 필터링 데이터 세트에서 미세 조정

Version 2.0 (2022년 11월 24일)

768x768 해상도의 새로운 모델
OpenCLIP-ViT/H를 텍스트 인코더로 사용
처음부터 학습, v-prediction 방법 채택

Stable UnCLIP 2.1 (2023년 3월 24일)

이미지 변환 및 혼합 작업 지원
SD2.1-768 기반 미세 조정
두 가지 변형 제공: Stable unCLIP-L 및 Stable unCLIP-H

핵심 기능

1. 텍스트-이미지 생성

기본적인 텍스트 설명으로 이미지 생성 기능:

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt <path/to/768model.ckpt/> --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

2. 이미지 복원 (Inpainting)

이미지의 부분적인 복원 및 편집 지원:

python scripts/gradio/inpainting.py configs/stable-diffusion/v2-inpainting-inference.yaml <path-to-checkpoint>

3. 깊이 조건 이미지 생성

깊이 정보를 기반으로 구조를 유지하는 이미지 생성:

python scripts/gradio/depth2img.py configs/stable-diffusion/v2-midas-inference.yaml <path-to-ckpt>

4. 이미지 초해상도

4배 초해상도 기능:

python scripts/gradio/superresolution.py configs/stable-diffusion/x4-upscaling.yaml <path-to-checkpoint>

5. 이미지-이미지 변환

클래식 img2img 기능:

python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8 --ckpt <path/to/model.ckpt>

설치 및 환경 구성

기본 환경

conda install pytorch==1.12.1 torchvision==0.13.1 -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

성능 최적화 (권장)

xformers 라이브러리를 설치하여 GPU 성능 향상:

export CUDA_HOME=/usr/local/cuda-11.4
conda install -c nvidia/label/cuda-11.4.0 cuda-nvcc
conda install -c conda-forge gcc
conda install -c conda-forge gxx_linux-64==9.5.0

cd ..
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e .
cd ../stablediffusion

Intel CPU 최적화

Intel CPU에 대한 최적화 구성:

apt-get install numactl libjemalloc-dev
pip install intel-openmp
pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable

기술 아키텍처 세부 사항

모델 구성 요소

인코더-디코더 아키텍처: 다운샘플링 팩터가 8인 자동 인코더 사용
U-Net 네트워크: 865M 매개변수의 U-Net을 확산 과정에 사용
텍스트 인코더: OpenCLIP ViT-H/14가 텍스트 입력 처리
샘플러: DDIM, PLMS, DPMSolver 등 다양한 샘플링 방법 지원

메모리 최적화

자동 메모리 효율적인 주의 메커니즘 활성화
xformers 가속 지원
FP16 정밀도 옵션을 제공하여 VRAM 절약

응용 분야

1. 예술 창작

컨셉 아트 디자인
일러스트레이션 생성
스타일 전이

2. 콘텐츠 제작

마케팅 자료 제작
소셜 미디어 콘텐츠
제품 프로토타입 디자인

3. 연구 응용

컴퓨터 비전 연구
생성 모델 연구
다중 모드 학습

윤리적 고려 사항 및 제한 사항

데이터 편향

모델은 학습 데이터의 편향과 오해를 반영합니다.
추가 안전 메커니즘 없이 상업 서비스에 직접 사용하는 것은 권장하지 않습니다.

콘텐츠 보안

AI 생성 콘텐츠를 식별하는 데 도움이 되는 내장된 보이지 않는 워터마크 시스템
명시적인 포르노 콘텐츠를 줄이기 위해 노력하지만 여전히 신중하게 사용해야 합니다.

사용 제한

가중치는 연구 목적으로만 제공됩니다.
CreativeML Open RAIL++-M 라이선스를 준수합니다.