reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Group Classification with Descending Soft Labeling for Deep Imbalanced Regression

Authors: Ruizhi Pu, Gezheng Xu, Ruiyi Fang, Bing-Kun Bao, Charles Ling, Boyu Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on realworld datasets also validate the effectiveness of our method.
Researcher Affiliation	Academia	1 Department of Computer Sceince, Western University 2 School of Computer Science, Nanjing University of Posts and Telecommunications
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Appendix https://github.com/Ruizhi Pu-CS/Group-DIR
Open Datasets	Yes	IMDB-WIKI-DIR is a large-scale real-world human facial dataset constructed by (Rothe, Timofte, and Van Gool 2018) and re-organized for imbalance tasks by (Yang et al. 2021), it contains 235K face images. Age DB-DIR is another real-world human facial dataset constructed by (Moschoglou et al. 2017) and also reorganized by (Yang et al. 2021). STS-B-DIR is a text similarity score dataset constructed by (Wang et al. 2018) and re-constructed by (Yang et al. 2021).
Dataset Splits	Yes	There are 191.5K imbalance training images, 11K balanced validation images, and 11K balanced test images. It contains 12.2K image training data, 2.1K image validation data, and 2.1K image test data. There are 5.2K pairs for the training, 1K balanced pairs for validation, and 1K balanced pairs for test. Same as (Yang et al. 2021; Branco, Torgo, and Ribeiro 2017), the train data distribution is always highly skewed while the test distribution is balanced.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions the use of ResNet-50 as a backbone.
Software Dependencies	No	The paper mentions "Bi LSTM + Glo Ve word embeddings" but does not specify versions for any libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Moreover, we follow the training procedures and hyperparameters (e.g., temperature t) as (Zha et al. 2023a), but apart from (Zha et al. 2023a) which only used a sub-sample of both datasets (e.g., 32K for IMDB-WIKI-DIR), we stick to the setting of (Yang et al. 2021) and use the full training set with the batch size of 128 for training.