ComfyUI_wav2lip
Wav2Lip Node for ComfyUI
The Wav2Lip node is a custom node for ComfyUI that allows you to perform lip-syncing on videos using the Wav2Lip model. It takes an input video and an audio file and generates a lip-synced output video.
Features
- Lip-syncing of videos using the Wav2Lip model
- Support for various face detection models
- Audio path upload for input audio file
Inputs
images
: Input video frames (required)audio
: Input audio file (required)mode
: Processing mode, either "sequential" or "repetitive" (default: "sequential")face_detect_batch
: Batch size for face detection (default: 8)
Outputs
images
: Lip-synced output video framesaudio
: Output audio file
Installation
-
Clone the repository to custom_nodes folder:
git clone https://github.com/ShmuelRonen/ComfyUI_wav2lip.git
-
Install the required dependencies:
pip install -r requirements.txt
Model Setup
To use the Wav2Lip node, you need to download the required models separately. Please follow these steps:
wav2lip model:
- Download the wav2lip model: -model-
- Place the
.pth model file in the
custom_nodes\ComfyUI_wav2lip\Wav2Lip\checkpoints` folder - Start or restart ComfyUI.
Usage
-
Add the Wav2Lip node to your ComfyUI workflow.
-
Connect the input video frames and audio file to the corresponding inputs of the Wav2Lip node.
-
Adjust the node settings according to your requirements:
- Set the
mode
to "sequential" or "repetitive" based on your video processing needs. - Adjust the
face_detect_batch
size if needed.
- Set the
-
Execute the ComfyUI workflow to generate the lip-synced output video.
Acknowledgement
Thanks to ArtemM, Wav2Lip, PIRenderer, GFP-GAN, GPEN, ganimation_replicate, STIT for sharing their code.
Related Work
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
- DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)