reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning

Authors: Yiling Xie, Xiaoming Huo

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments demonstrate the favorable practical performance of the adjusted estimator over the classic one. Further, numerical experiments are conducted and analyzed in Section 6.
Researcher Affiliation	Academia	Yiling Xie EMAIL Xiaoming Huo EMAIL School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia, USA
Pseudocode	No	The paper describes methodologies using mathematical formulations and prose (e.g., in Section 2 "Adjustment Technique" and Section 4 "Proposed Adjusted WDRO Estimator"), but it does not contain any explicitly labeled pseudocode blocks or algorithm figures.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the described methodology.
Open Datasets	No	The paper states that data is generated for the numerical experiments: "Suppose X follows 2-dimensional standard normal distribution, and the response variable Y follows the Bernoulli distribution... Data is generated 5 times for each sample size n {500, 700, 1000, 1500, 1800, 2000}." (Section 6.1.1) and "Assume the feature variable X follows the 2-dimensional standard normal distribution, and the response variable Y follows the normal distribution... Data is generated 5 times for each sample size n {500, 700, 1000, 1500, 1800, 2000}." (Section 6.1.2). The authors synthesize data rather than using pre-existing public datasets, and no access information for any dataset is provided.
Dataset Splits	No	The paper describes synthetic data generation for different sample sizes, rather than providing details on how a pre-existing dataset is split into training, validation, or test sets. For example: "Data is generated 5 times for each sample size n {500, 700, 1000, 1500, 1800, 2000}."
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the numerical experiments.
Software Dependencies	No	The paper references an "iterative algorithm in Blanchet et al. (2022c)" for computing WDRO estimators but does not specify any software libraries, frameworks, or their version numbers used for implementation (e.g., Python, PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup	Yes	Logistic Regression: Per the iterative algorithm, we set the learning rate as 0.3 and the maximum number of iterations as 50000, respectively. Moreover, since the value of τ, which is the coefficient in the Wasserstein radius ρn = τ/ n, should be determined, we let τ {1.5, 2, 2.5, 3}. Linear Regression: Per the iterative algorithm, we set the learning rate as 0.01 and the maximum number of iterations as 50000, respectively. Then, we set the value of τ as τ {1.5, 2, 2.5, 3}.