reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Making Reliable and Flexible Decisions in Long-tailed Classification

Authors: Bolian Li, Ruqi Zhang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In empirical evaluation, we design a new metric, False Head Rate, to quantify tail-sensitivity risk, along with comprehensive experiments on multiple real-world tasks, including large-scale image classification and uncertainty quantification, to demonstrate the reliability and flexibility of our method. We conduct comprehensive experiments to demonstrate that RF-DLC significantly improves decision-making while maintaining or improving traditional metrics such as accuracy and calibration.
Researcher Affiliation	Academia	Bolian Li EMAIL Ruqi Zhang EMAIL Department of Computer Science, Purdue University West Lafayette, IN 47907, USA
Pseudocode	Yes	The proposed RF-DLC is summarized in Algorithm 1.
Open Source Code	Yes	1https://github.com/lblaoke/RF-DLC.
Open Datasets	Yes	We use CIFAR10/100-LT (Cui et al., 2019), Image Net-LT (Liu et al., 2019b), and i Naturalist (Van Horn et al., 2018) as the long-tailed datasets... We also conduct a long-tailed medical image classification experiments in Table 5, where our RF-DLC successfully recognize different disease types and outperforms compared baselines. The experiments are based on Res Net32, and the number of particles is set to 3. The Derma MNIST dataset (Yang et al., 2023) is originally an imbalanced classification dataset with the imbalance ratio to be around 60.
Dataset Splits	Yes	In long-tailed categorical data, the training and testing sets follow different distributions... Classes are equally split into three class regions (head, med and tail). For example, there are 33, 33 and 34 classes respectively in the head, med and tail regions of CIFAR100-LT.
Hardware Specification	Yes	We run all experiments on an NVIDIA RTX A6000 GPU (49 GB) and do not need multiple GPUs for one model.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers.
Experiment Setup	Yes	Table 10: Hyper-parameter configurations. Dataset Base Model Optimizer Batch Size Learning Rate Training Epochs Discrepancy Ratio λ τ α CIFAR10-LT Res Net32 SGD 128 0.1 200 linear 5e-4 40 0.002 CIFAR100-LT Res Net32 SGD 128 0.1 200 linear 5e-4 40 0.3 Image Net-LT Res Net50 SGD 256 0.1 100 linear 2e-4 20 50 i Naturalist Slim Res Net50 SGD 512 0.2 100 linear 2e-4 20 100