reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Navigating Label Ambiguity for Facial Expression Recognition in the Wild

Authors: JunGyu Lee, Yeji Choi, Haksub Kim, Ig-Jae Kim, Gi Pyo Nam

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that NLA outperforms existing methods in both overall and mean accuracy, confirming its robustness against noise and class imbalance. To the best of our knowledge, this is the first framework to address both problems simultaneously.
Researcher Affiliation	Academia	1 Korea Institute of Science and Technology, Seoul, Korea 2 AI-Robotics, KIST School, University of Science and Technology, Daejeon, Korea 3 Yonsei University, Seoul, Korea
Pseudocode	No	The paper describes the method using mathematical formulations and textual explanations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code, such as a repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets	Yes	Datasets. RAF-DB (Li, Deng, and Du 2017) contains 30,000 face images labeled with 7 basic and compound expressions by 40 trained annotators. We use a subset with 7 basic expressions, comprising 12,271 training images and 3,068 test images. FERPlus (Barsoum et al. 2016), an extension of FER2013 (Goodfellow et al. 2013), provides 8 expression labels with the addition of Contempt, labeled by 10 annotators. For a fair comparison, we use the same 7 basic classes, totaling 28,709 training images and 3,589 test images. Affect Net (Mollahosseini, Hasani, and Mahoor 2017), the largest FER dataset, contains 450,000 face images with 7 basic expressions and a contempt label.
Dataset Splits	Yes	RAF-DB [...] comprising 12,271 training images and 3,068 test images. FERPlus [...] totaling 28,709 training images and 3,589 test images. Affect Net [...] which include 283,901 training images and 3,500 test images.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A100.
Software Dependencies	No	The paper mentions using Res Net-18, Adam optimizer, and Exponential LR scheduler, but it does not specify version numbers for these software components or any underlying frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We utilize Res Net-18 (He et al. 2016) pre-trained on MS-Celeb-1M (Guo et al. 2016), following previous works (Zhang et al. 2022a; Wu and Cui 2023), for a fair comparison. All face images are aligned and cropped based on three landmarks (Wang, Bo, and Fuxin 2019), and then resized to 224 224. For network training, we use the Adam optimizer (Kingma and Ba 2014) with a weight decay of 0.0001 and employ the Exponential LR scheduler (Li and Arora 2019) with a gamma value of 0.9 and an initial learning rate of 0.0001. We set λ to 0.5, the batch size to 32, and the maximum training epoch to 60, with the best performance observed at epoch 40. We determine each Σ based on the diagonal element σ11 = 0.8 in both cases, with the ratio of the major to minor axis being 2:1 and 6:1 for the true and false cases, respectively.