Capturing the Unseen: Vision-Free Facial Motion Capture Using Inertial Measurement Units

Authors: Youjia Wang, Yiwen Wu, Hengan Zhou, Hongyang Lin, Xingyue Peng, Jingyan Zhang, Yingsheng Zhu, YingWenQi Jiang, Yatu Zhang, Lan Xu, Jingya Wang, Jingyi Yu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that CAPUS reliably captures facial motion in conditions where visual-based methods struggle, including facial occlusions, rapid movements, and low-light environments.
Researcher Affiliation Collaboration 1Shanghai Tech University 2Lumi Ani Technology 3Deemos Technology EMAIL
Pseudocode No The paper describes the methodology in text and through a network architecture diagram (Fig. 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Both the dataset and the code are released to the community for comprehensive evaluations.
Open Datasets Yes CAPUS introduces the first facial IMU dataset, encompassing both IMU and visual signals from participants engaged in diverse activities such as multilingual speech, facial expressions, and emotionally intoned auditions. ... Both the dataset and the code are released to the community for comprehensive evaluations.
Dataset Splits No The paper mentions training and testing phases and refers to a 'test set', but does not provide specific details about the dataset splits (e.g., percentages or counts for training, validation, and test sets of their IMU-ARKit dataset). For an ablation study, it states 'Small Dataset: we train the network using 1/3 of the dataset.', but this is not the general split information.
Hardware Specification Yes We train and evaluate CAPUS on a single NVIDIA RTX3090 GPU.
Software Dependencies No The paper mentions using Adam as the optimizer but does not specify any programming languages, libraries, or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) used for implementation.
Experiment Setup Yes We use Adam as the optimizer with a learning rate 2 10 4, α = 0.9, β = 0.999. We train and evaluate CAPUS on a single NVIDIA RTX3090 GPU. The training process takes 1 hours on all identities with paired data. ... T = 120 for both training and testing. To avoid jittering at inference, we set an overlap of 60 frames to ensure the network has sufficient prior information to accurately determine the initial state of the face within the time window.