reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking the Bias of Foundation Model under Long-tailed Distribution

Authors: Jiahao Chen, Bin Qin, Jiangmeng Li, Hao Chen, Bing Su

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we examine how such imbalances from pre-training affect long-tailed downstream tasks. Specifically, we find the imbalance biases inherited in foundation models on downstream tasks as parameter imbalance and data imbalance. ... We achieve at least 1.5%, 1.5%, 2.0% performance gains on Image Net-LT (Deng et al., 2009), Places365LT (Liu et al., 2019), and i Naturalist2018 (Van Horn et al., 2018) compared with state-of-the-art methods.
Researcher Affiliation	Academia	1Gaoling School of Artificial Intelligence, Renmin University of China 2Beijing Key Laboratory of Research on Large Models and Intelligent Governance 3Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE 4Institute of Software Chinese Academy of Sciences 5University of Chinese Academy of Sciences 6Electrical and Computer Engineering, Carnegie Mellon University. Correspondence to: Bing Su <EMAIL>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing code or a link to a code repository.
Open Datasets	Yes	We achieve at least 1.5%, 1.5%, 2.0% performance gains on Image Net-LT (Deng et al., 2009), Places365LT (Liu et al., 2019), and i Naturalist2018 (Van Horn et al., 2018) compared with state-of-the-art methods.
Dataset Splits	No	Following OLTR (Liu et al., 2019), we split the classes into three groups named D-Many , D-Medium , and D-Few relying on the number of samples. Similarly, for parameter imbalance, we split the classes into three groups named P-Many , P-Medium , and P-Few relying on b PP (Y ). More details are in the Appendix Sec. A. ... Additionally, in Tab.11, we provide results highlighting the performance under parameter imbalance.
Hardware Specification	Yes	For training resources, all experiments are conducted on Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz with a single RTX A40 GPU. Normally, a GPU with 24GB of memory is sufficient for the reproduction.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup	Yes	We present the details about the hyper-parameters of our experiments on different datasets in Tab. 9, where lr, epochs denote the initial learning rate and training epochs, respectively. We denote batch size in Tab. 9 as the training batch size during the fine-tuning phase. ... The learning rate, number of epochs, and parameter initialization strategies follows (Shi et al., 2024).