reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Principled Algorithms for Optimizing Generalized Metrics in Binary Classification

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report the results of experiments demonstrating the effectiveness of our methods compared to prior baselines. In this section, we present empirical results for our principled algorithms for optimizing generalized metrics on the CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) datasets.
Researcher Affiliation	Collaboration	1Courant Institute of Mathematical Sciences, New York, NY; 2Google Research, New York, NY. Correspondence to: Anqi Mao <EMAIL>, Mehryar Mohri <EMAIL>, Yutao Zhong <EMAIL>.
Pseudocode	Yes	Algorithm 1 Binary search estimation of λ Algorithm 2 Generalized metrics optimization algorithm Algorithm 3 Generalized metrics optimization algorithm with cross-validation
Open Source Code	No	The paper does not provide any explicit statement or link regarding the public availability of source code for the described methodology.
Open Datasets	Yes	Our experiments use a three-hidden-layer CNN with Re LU activations (Le Cun et al., 1995)... on the CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) datasets.
Dataset Splits	No	The paper mentions using CIFAR-10, CIFAR-100, and SVHN datasets for training and extracts two classes for binary classification, but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications, or cloud instance types).
Software Dependencies	No	The paper mentions using a three-hidden-layer CNN with ReLU activations and Stochastic Gradient Descent (SGD) with Nesterov momentum, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The initial learning rate, batch size, and weight decay were set to 0.02, 1,024, and 1 10 4, respectively. A cosine decay learning rate schedule (Loshchilov & Hutter, 2022) was used over the course of 100 epochs.