reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reweighting Improves Conditional Risk Bounds

Authors: Yikai Zhang, Jiahe Lin, Fengpei Li, Songzhu Zheng, Anant Raj, Anderson Schneider, Yuriy Nevmyvaka

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings are supported by evidence from synthetic data experiments. Empirically, we demonstrate that with a properly designed weight function, one can achieve superior performance in selected regions, respectively for heteroscedastic regression and classification tasks.
Researcher Affiliation	Industry	Yikai Zhang EMAIL Machine Learning Research, Morgan Stanley
Pseudocode	No	The paper describes methods using mathematical formulations and prose, without explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	Our findings are supported by evidence from synthetic data experiments. We present results from synthetic data experiments to support our theoretical claims, respectively for regression and classification settings. The true data generating process is given by a univariate regression with x R of the form y = f (x) + pσ2 (x) ξ, E(ξ) = 0; Var(ξ) = 1. For classification, we consider the following data-generating process for illustration purpose, in which extremely noisy data points are present. The paper describes how it generated synthetic data, but no concrete access information to a publicly available or open dataset is provided.
Dataset Splits	No	For both experiments, the size of the training set is set at 2e4, to ensure that the algorithm has access to adequate number of samples and circumvent any potential issues due to lack of fit, although empirically the conclusion broadly holds even with much smaller sample sizes. Once bf ERM(x) and bfw ERM(x) are obtained, on the test set, we consider evaluating their risk over a range of selective set with varying coverage α [0, 1]. The cut-off qα(σ2) is determined by the empirical quantile of the estimated σ2 on the validation set. However, the paper only specifies the training set size (2e4) and mentions test and validation sets without providing their explicit sizes or proportions relative to the total dataset, or how the split was performed (e.g., random seed).
Hardware Specification	No	The paper discusses synthetic data experiments but does not specify any hardware used for conducting these experiments (e.g., GPU models, CPU types, or cloud infrastructure).
Software Dependencies	No	The paper mentions that 'f(x) and σ2(x) are both parametrized by multi-layer perceptrons (MLP)' but does not provide specific software names or version numbers of any libraries, frameworks, or programming languages used.
Experiment Setup	No	We consider the following estimation procedure using ℓ2 loss; f(x) and σ2(x) are both parametrized by multi-layer perceptrons (MLP). Similar to the case of regression, we consider the following procedure that entails two steps, using the cross-entropy loss. However, the paper does not specify concrete hyperparameter values such as learning rates, batch sizes, optimizer details, or the architecture of the multi-layer perceptrons (e.g., number of layers, number of units per layer) used in the experiments.