Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild
Authors: Xingjian Wang, Li Chai
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on in-the-wild datasets have demonstrated that IFDD outperforms prior supervised DFER methods with higher recognition accuracy and comparable efficiency. |
| Researcher Affiliation | Academia | Xingjian Wang, Li Chai* The State Key Laboratory of Industrial Control Technology, Zhejiang University EMAIL |
| Pseudocode | No | The paper describes the methodology using text, mathematical equations, and diagrams, but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code https://github.com/Cyber Pegasus/IFDD |
| Open Datasets | Yes | We conduct evaluation on three important in-the-wild DFER datasets including DFEW(Jiang et al. 2020) with 16,372 videos, FERV39k(Wang et al. 2022a) with 38,935 videos, and MAFW(Liu et al. 2022) with 10,045 videos. |
| Dataset Splits | Yes | DFEW and MAFW both provide 5-fold cross-validation settings, while FERV39k provides a train-test splitting setting. ... Training sets from aforementioned datasets are further divided into training and validation set at a ratio of 4:1. |
| Hardware Specification | Yes | IFDD is implemented by Py Torch and trained on NVIDIA RTX 3090 for 100 epochs. |
| Software Dependencies | No | IFDD is implemented by Py Torch. No specific version number for PyTorch or other libraries is provided. |
| Experiment Setup | Yes | IFDD is implemented by Py Torch and trained on NVIDIA RTX 3090 for 100 epochs. We utilize Adam W optimizer and cosine scheduler with 1e-4 initial learning rate and 1e-3 weight decay, where the former 10 epochs adopt warm-up strategy with 1e-6 learning rate. ... Fixed number T0 of frames are uniformly sampled from videos and resized into the size of H0 W0 as clips for training and inference. {T0, H0, W0} are set to {16,224,224} for DFEW and FERV39k, and {32,224,224} for MAFW. Data augmentation methods including random horizontal flip and random crop are adopted. ... compressing factor 1 Dc of Conv( ) in Eq. 1 is set to 4 for IFDD-3DVi T and 1 for IFDD-2DCNN respectively. Channel number d T of temporal tokens Z is set to 128 for the both. |