kijai/ComfyUI-WanVideoWrapper View GitHub Homepage for Latest Official Releases

ComfyUI wrapper for the WanVideo model, supporting Alibaba's WanVideo 2.1 series of AI video generation models.

Apache-2.0PythonComfyUI-WanVideoWrapperkijai 5.0k Last Updated: October 26, 2025

Detailed Introduction to ComfyUI-WanVideoWrapper Project

Project Overview

ComfyUI-WanVideoWrapper is a wrapper plugin specifically developed for the ComfyUI platform, primarily designed to support WanVideo and related models. Developed and maintained by kijai, this project serves as an experimental "sandbox" environment for rapidly testing and implementing new AI video generation models and features.

Project Background

Due to the complexity of ComfyUI's core code and the developer's lack of extensive coding experience, it is often easier and faster to implement new models and features within a standalone wrapper than directly within the core system. This project was born from this very philosophy.

Design Philosophy

Rapid Testing Platform: Serves as a quick validation environment for new features
Personal Sandbox: An experimental platform open for everyone to use
Avoid Compatibility Issues: Runs independently without affecting the stability of the main system
Continuous Development: The code is always under development and may contain issues

Core Features

Supported WanVideo Model Series

This wrapper primarily supports Alibaba's open-source Wan 2.1 series models, an advanced video generation model with leading performance:

Wan 2.1 Model Features:

High Performance: Consistently outperforms existing open-source models and state-of-the-art commercial solutions in multiple benchmarks
Bilingual Text Generation: The first video model capable of generating both Chinese and English text, boasting powerful text generation capabilities
Multi-Resolution Support: Supports 480P and 720P video generation
Physical Simulation: Generates videos that accurately simulate real-world physical effects and interactions between real-world objects

Model Specifications:

T2V-1.3B Model:
- Requires only 8.19 GB VRAM, compatible with almost all consumer-grade GPUs
- Can generate a 5-second 480P video in approximately 4 minutes on an RTX 4090
- Lightweight, suitable for general users
T2V-14B/I2V-14B Models:
- Achieves SOTA (State-Of-The-Art) performance in both open-source and closed-source models
- Supports complex visual scenes and motion patterns
- Suitable for professional-grade applications

Main Functional Modules

Text-to-Video (T2V)
Image-to-Video (I2V)
Video Editing
Text-to-Image
Video-to-Audio

Technical Architecture

Core Technical Components

Wan2.1 is designed based on the mainstream diffusion transformer paradigm, achieving significant improvements in generation capabilities through a series of innovations:

Wan-VAE: A novel 3D causal VAE architecture specifically designed for video generation, improving spatio-temporal compression, reducing memory usage, and ensuring temporal causality through various strategies
Scalable Training Strategy
Large-scale Data Construction
Automated Evaluation Metrics

Performance Characteristics

Memory Efficiency: Wan-VAE can encode and decode 1080P videos of infinite length without losing historical temporal information
GPU Compatibility: Supports running on consumer-grade GPUs
Processing Capability: Supports long video generation and complex scene processing

Installation and Usage

Installation Steps

Clone Repository:

git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git

Install Dependencies:

pip install -r requirements.txt

For portable installation:

python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\requirements.txt

Model Download

Main model download addresses:

Standard Models: https://huggingface.co/Kijai/WanVideo_comfy/tree/main
FP8 Optimized Models (Recommended): https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled

Model File Structure

Place the downloaded model files in the corresponding ComfyUI directories:

Text encoders → ComfyUI/models/text_encoders
Clip vision → ComfyUI/models/clip_vision
Transformer (main video model) → ComfyUI/models/diffusion_models
VAE → ComfyUI/models/vae

Supported Extended Models

This wrapper also supports several related AI video generation models:

SkyReels: A video generation model developed by Skywork
WanVideoFun: An entertainment-oriented model developed by Alibaba PAI Team
ReCamMaster: A video reconstruction model developed by Kuaishou VGI
VACE: A video enhancement model from Alibaba Vision Lab
Phantom: A multi-subject video generation model from ByteDance Research Institute
ATI: An attention transfer model from ByteDance Research Institute
Uni3C: A unified video understanding model from Alibaba DAMO Academy
EchoShot: A multi-shot portrait video generation model
MultiTalk: A multi-person dialogue video generation model

Application Cases and Examples

Long Video Generation Test

1025-frame Test: Using an 81-frame window size with 16 frames overlap
1.3B T2V Model: Less than 5GB VRAM used on a 5090 graphics card, generation time 10 minutes
Memory Optimization: Approximately 16GB memory used for 512x512x81 specifications, supporting 20/40 block offload

TeaCache Acceleration Optimization

The new version's threshold setting should be 10 times the original
Recommended coefficient range: 0.25-0.30
Starting steps can begin from 0
More aggressive threshold values are recommended to start later to avoid skipping early steps

Technical Advantages

Open-Source Ecosystem: Fully open-source, including source code and all models
Leading Performance: Consistently outperforms existing open-source models and state-of-the-art commercial solutions in multiple internal and external benchmarks
Comprehensive Coverage: Covers multiple downstream applications, including image-to-video, instruction-guided video editing, and personal video generation, encompassing up to 8 tasks
Consumer-Friendly: The 1.3B model demonstrates excellent resource efficiency, requiring only 8.19GB VRAM and compatible with a wide range of consumer-grade GPUs

Project Status and Development

Future Development

Not intended to compete with native workflows or provide alternatives
The ultimate goal is to help explore newly released models and features
Some features may eventually be integrated into the ComfyUI core system

Usage Recommendations

Applicable Scenarios

AI video generation research and experimentation
Rapid testing and validation of new models
Creative video content production
Educational and learning purposes

Important Notes

The code is under continuous development and may have stability issues
Recommended for testing and use in an an isolated environment
Requires a certain level of technical background and GPU resources

Conclusion

ComfyUI-WanVideoWrapper is an innovative AI video generation tool wrapper that provides users with convenient access to the latest video generation technologies. Based on Alibaba's open-source Wan 2.1 series models, this project not only maintains technological leadership but also embodies the collaborative spirit of the open-source community. Although the project is still under continuous development, its powerful features and extensive model support make it an important tool in the field of AI video generation.