reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Analytical Model for Overparameterized Learning Under Class Imbalance

Authors: Eliav Mor, Yair Carmon

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our theoretical ﬁndings on simulated data and imbalanced CIFAR10, MNIST and Fashion MNIST datasets.
Researcher Affiliation	Academia	Eliav Mor EMAIL Department of Computer Science Tel Aviv Univeristy Yair Carmon EMAIL Department of Computer Science Tel Aviv Univeristy
Pseudocode	No	The paper does not contain explicitly labeled pseudocode or algorithm blocks. It primarily uses mathematical notation and textual descriptions for its methods.
Open Source Code	No	The paper mentions using third-party tools like Py Torch, CVXPY, and MOSEK solver, but does not provide a specific link or explicit statement about releasing the source code for their own methodology.
Open Datasets	Yes	We test our theoretical ﬁndings on simulated data and imbalanced CIFAR10, MNIST and Fashion MNIST datasets.
Dataset Splits	Yes	For each dataset use in our tests (CIFAR10, MNIST and Fashion MNIST), we sample 5, 8, 13, 23, 38, 64, 107, 179, 299, 500 and 5, 100, 120, 140, 160, 180, 200, 220, 300, 500 samples per class for the exponential and modiﬁed proﬁles, respectively. ... In addition, we featurize the standard test sets of each dataset and use to test the learned predictors.
Hardware Specification	No	training is distributed across 4 GPUs.
Software Dependencies	Yes	We use Py Torch (Paszke et al., 2019) to run gradient descent... We find the MM, MA, CDT and LA predictors we consider by solving the corresponding margin maximization problems (deﬁned in Section 2.2) using CVXPY (Diamond & Boyd, 2016) with the MOSEK solver (MOSEK Ap S, 2023).
Experiment Setup	Yes	Fine-tuning is performed using Py Torch (Paszke et al., 2019) while training is conducted for 1000 epochs, employing the SGD optimizer with a batch size of 128, no momentum, no weight decay, and gradient clipping with global norm threshold of 1. The learning rate is set to 1e-4 with cosine learning rate scheduler, and the training is distributed across 4 GPUs.