FlexiClip: Locality-Preserving Free-Form Character Animation

Authors: Anant Khandelwal

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the effectiveness of Flexi Clip in generating animations that are not only smooth and natural but also structurally consistent across diverse clipart types, including humans and animals. By integrating spatial and temporal modeling with pretrained video diffusion models, Flexi Clip sets a new standard for high-quality clipart animation, offering robust performance across a wide range of visual content.
Researcher Affiliation Industry 1Search Technology Center India, Microsoft, IDC, Bengaluru, India(Bharat). Correspondence to: Anant Khandelwal <EMAIL, EMAIL>.
Pseudocode No The paper describes methods and mathematical formulations but does not present them in a structured pseudocode or algorithm block.
Open Source Code No Project Page: https://creativegen.github.io/flexiclip.github.io/ - While a project page is provided, it is not explicitly a source code repository link, nor does the text contain a clear statement about releasing source code for the methodology.
Open Datasets Yes We used 30 clipart images from Ani Clipart (Wu et al., 2024) and additional ones from Freepik across various categories (humans, animals, and objects), resized to 256 256 pixels.
Dataset Splits No The paper mentions using 30 clipart images from Ani Clipart and additional ones from Freepik, but it does not specify any training, validation, or test splits for these images.
Hardware Specification Yes Standard 24-frame animations were rendered on an NVIDIA V100 in 40 minutes using 26 GB.
Software Dependencies No The paper mentions several tools and models like Diff VG, Model Scope T2V, and the Adam optimizer, but does not provide specific version numbers for any software dependencies used in their implementation.
Experiment Setup Yes motion trajectories with 8 11 control points were optimized upto 700 steps using Adam (learning rate: 0.5). We applied Model Scope T2V (Wang et al., 2023) with a guidance parameter of 50 for SDS loss. For spatial posing, cubic Bézier control points, we use a 4-layer MLP with Leaky ReLU activation, with the final layer being linear. Temporal Jacobians from pf ODE were predicted with a 3-layer MLP, while two attention networks with 32-dimensional keys/values and two heads modeled motion and deformation effectively.