ComfyUI wrapper for the WanVideo model, supporting Alibaba's WanVideo 2.1 series of AI video generation models.

Apache-2.0PythonComfyUI-WanVideoWrapperkijai 4.4k Last Updated: September 13, 2025

Detailed Introduction to ComfyUI-WanVideoWrapper Project

Project Overview

ComfyUI-WanVideoWrapper is a wrapper plugin specifically developed for the ComfyUI platform, primarily designed to support WanVideo and related models. Developed and maintained by kijai, this project serves as an experimental "sandbox" environment for rapidly testing and implementing new AI video generation models and features.

Project Background

Due to the complexity of ComfyUI's core code and the developer's lack of extensive coding experience, it is often easier and faster to implement new models and features within a standalone wrapper than directly within the core system. This project was born from this very philosophy.

Design Philosophy

  • Rapid Testing Platform: Serves as a quick validation environment for new features
  • Personal Sandbox: An experimental platform open for everyone to use
  • Avoid Compatibility Issues: Runs independently without affecting the stability of the main system
  • Continuous Development: The code is always under development and may contain issues

Core Features

Supported WanVideo Model Series

This wrapper primarily supports Alibaba's open-source Wan 2.1 series models, an advanced video generation model with leading performance:

Wan 2.1 Model Features:

  • High Performance: Consistently outperforms existing open-source models and state-of-the-art commercial solutions in multiple benchmarks
  • Bilingual Text Generation: The first video model capable of generating both Chinese and English text, boasting powerful text generation capabilities
  • Multi-Resolution Support: Supports 480P and 720P video generation
  • Physical Simulation: Generates videos that accurately simulate real-world physical effects and interactions between real-world objects

Model Specifications:

  1. T2V-1.3B Model:

    • Requires only 8.19 GB VRAM, compatible with almost all consumer-grade GPUs
    • Can generate a 5-second 480P video in approximately 4 minutes on an RTX 4090
    • Lightweight, suitable for general users
  2. T2V-14B/I2V-14B Models:

    • Achieves SOTA (State-Of-The-Art) performance in both open-source and closed-source models
    • Supports complex visual scenes and motion patterns
    • Suitable for professional-grade applications

Main Functional Modules

  1. Text-to-Video (T2V)
  2. Image-to-Video (I2V)
  3. Video Editing
  4. Text-to-Image
  5. Video-to-Audio

Technical Architecture

Core Technical Components

Wan2.1 is designed based on the mainstream diffusion transformer paradigm, achieving significant improvements in generation capabilities through a series of innovations:

  1. Wan-VAE: A novel 3D causal VAE architecture specifically designed for video generation, improving spatio-temporal compression, reducing memory usage, and ensuring temporal causality through various strategies
  2. Scalable Training Strategy
  3. Large-scale Data Construction
  4. Automated Evaluation Metrics

Performance Characteristics

  • Memory Efficiency: Wan-VAE can encode and decode 1080P videos of infinite length without losing historical temporal information
  • GPU Compatibility: Supports running on consumer-grade GPUs
  • Processing Capability: Supports long video generation and complex scene processing

Installation and Usage

Installation Steps

  1. Clone Repository:

    git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git
    
  2. Install Dependencies:

    pip install -r requirements.txt
    

    For portable installation:

    python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\requirements.txt
    

Model Download

Main model download addresses:

Model File Structure

Place the downloaded model files in the corresponding ComfyUI directories:

  • Text encodersComfyUI/models/text_encoders
  • Clip visionComfyUI/models/clip_vision
  • Transformer (main video model) → ComfyUI/models/diffusion_models
  • VAEComfyUI/models/vae

Supported Extended Models

This wrapper also supports several related AI video generation models:

  1. SkyReels: A video generation model developed by Skywork
  2. WanVideoFun: An entertainment-oriented model developed by Alibaba PAI Team
  3. ReCamMaster: A video reconstruction model developed by Kuaishou VGI
  4. VACE: A video enhancement model from Alibaba Vision Lab
  5. Phantom: A multi-subject video generation model from ByteDance Research Institute
  6. ATI: An attention transfer model from ByteDance Research Institute
  7. Uni3C: A unified video understanding model from Alibaba DAMO Academy
  8. EchoShot: A multi-shot portrait video generation model
  9. MultiTalk: A multi-person dialogue video generation model

Application Cases and Examples

Long Video Generation Test

  • 1025-frame Test: Using an 81-frame window size with 16 frames overlap
  • 1.3B T2V Model: Less than 5GB VRAM used on a 5090 graphics card, generation time 10 minutes
  • Memory Optimization: Approximately 16GB memory used for 512x512x81 specifications, supporting 20/40 block offload

TeaCache Acceleration Optimization

  • The new version's threshold setting should be 10 times the original
  • Recommended coefficient range: 0.25-0.30
  • Starting steps can begin from 0
  • More aggressive threshold values are recommended to start later to avoid skipping early steps

Technical Advantages

  1. Open-Source Ecosystem: Fully open-source, including source code and all models
  2. Leading Performance: Consistently outperforms existing open-source models and state-of-the-art commercial solutions in multiple internal and external benchmarks
  3. Comprehensive Coverage: Covers multiple downstream applications, including image-to-video, instruction-guided video editing, and personal video generation, encompassing up to 8 tasks
  4. Consumer-Friendly: The 1.3B model demonstrates excellent resource efficiency, requiring only 8.19GB VRAM and compatible with a wide range of consumer-grade GPUs

Project Status and Development

Future Development

  • Not intended to compete with native workflows or provide alternatives
  • The ultimate goal is to help explore newly released models and features
  • Some features may eventually be integrated into the ComfyUI core system

Usage Recommendations

Applicable Scenarios

  • AI video generation research and experimentation
  • Rapid testing and validation of new models
  • Creative video content production
  • Educational and learning purposes

Important Notes

  • The code is under continuous development and may have stability issues
  • Recommended for testing and use in an an isolated environment
  • Requires a certain level of technical background and GPU resources

Conclusion

ComfyUI-WanVideoWrapper is an innovative AI video generation tool wrapper that provides users with convenient access to the latest video generation technologies. Based on Alibaba's open-source Wan 2.1 series models, this project not only maintains technological leadership but also embodies the collaborative spirit of the open-source community. Although the project is still under continuous development, its powerful features and extensive model support make it an important tool in the field of AI video generation.

Star History Chart