Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Authors: Xingjian Wang, Li Chai

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on in-the-wild datasets have demonstrated that IFDD outperforms prior supervised DFER methods with higher recognition accuracy and comparable efficiency.
Researcher Affiliation Academia Xingjian Wang, Li Chai* The State Key Laboratory of Industrial Control Technology, Zhejiang University EMAIL
Pseudocode No The paper describes the methodology using text, mathematical equations, and diagrams, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code https://github.com/Cyber Pegasus/IFDD
Open Datasets Yes We conduct evaluation on three important in-the-wild DFER datasets including DFEW(Jiang et al. 2020) with 16,372 videos, FERV39k(Wang et al. 2022a) with 38,935 videos, and MAFW(Liu et al. 2022) with 10,045 videos.
Dataset Splits Yes DFEW and MAFW both provide 5-fold cross-validation settings, while FERV39k provides a train-test splitting setting. ... Training sets from aforementioned datasets are further divided into training and validation set at a ratio of 4:1.
Hardware Specification Yes IFDD is implemented by Py Torch and trained on NVIDIA RTX 3090 for 100 epochs.
Software Dependencies No IFDD is implemented by Py Torch. No specific version number for PyTorch or other libraries is provided.
Experiment Setup Yes IFDD is implemented by Py Torch and trained on NVIDIA RTX 3090 for 100 epochs. We utilize Adam W optimizer and cosine scheduler with 1e-4 initial learning rate and 1e-3 weight decay, where the former 10 epochs adopt warm-up strategy with 1e-6 learning rate. ... Fixed number T0 of frames are uniformly sampled from videos and resized into the size of H0 W0 as clips for training and inference. {T0, H0, W0} are set to {16,224,224} for DFEW and FERV39k, and {32,224,224} for MAFW. Data augmentation methods including random horizontal flip and random crop are adopted. ... compressing factor 1 Dc of Conv( ) in Eq. 1 is set to 4 for IFDD-3DVi T and 1 for IFDD-2DCNN respectively. Channel number d T of temporal tokens Z is set to 128 for the both.