ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters

Authors: Xunzhi Xiang, Haiwei Xue, Zonghong Dai, Di Wang, Minglei Li, Ye Yue, Fei Ma, Weijiang Yu, Heng Chang, Fei Richard Yu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method outperforms state-of-the-art methods on five metrics in public datasets. Additionally, qualitative evaluations highlight a significant improvement in the quality of generated videos, demonstrating our approach s superiority.
Researcher Affiliation Collaboration 1Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 201AI, Beijing, China 3Tsinghua University, Shenzhen, Guangdong, China 4Sun Yat-sen University, Guangzhou, Guangdong, China 5Shenzhen University, Shenzhen, Guangdong, China 6Carleton University, Canada EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using text and mathematical formulations but does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper mentions "Our method, which exclusively utilizes open-source datasets" but does not provide any concrete access information (link, explicit statement of release) for the source code of their own methodology.
Open Datasets Yes We propose a Mask-guided Human-Centric framework, Re Mask-Animate, which exclusively utilizes open-source datasets to achieve character image animation and significantly enhances the quality of visual generation. Datasets. The Tik Tok dataset comprises 350 dance videos... In contrast, the Fashion dataset is characterized by a minimalistic, pure white background and limited motion variation...
Dataset Splits Yes The Fashion dataset... including 500 training videos and 100 testing videos...
Hardware Specification Yes We train our model using 4 NVIDIA A800 GPUs in a two-stage process.
Software Dependencies No The paper mentions freezing the CLIP image encoder and VAE but does not provide specific version numbers for any programming languages, libraries, or other software components used in their implementation.
Experiment Setup Yes In the initial stage... we randomly center-crop the input images to 768 768, use a batch size of 4, and train the model for 60,000 steps with a learning rate of 0.0001. In the subsequent stage, we randomly center-crop video frames to 512 512, use a batch size of 1, and train for an additional 20,000 steps while maintaining the same learning rate.