reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stronger Neyman Regret Guarantees for Adaptive Experimental Design

Authors: Georgy Noarov, Riccardo Fogliato, Martin Andres Bertran, Aaron Roth

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first present the results for the non-contextual setting and then turn to the analysis of the performance for the contextual algorithm. Our code is available at the following link: https://github.com/amazon-science/adaptive-abtester. 5.1. Non-Contextual Experiments Tasks We compare our method Clip OGDSC with Clip OGD0 (Dai et al., 2023) on multiple tasks. Below, we show two key datasets (one synthetic and one realworld) used in our experiments, with full details in Appendix E. [...] Figure 1 shows the Neyman regret across these settings, matching our theoretical expectations: when σ = 0.1, the regret of Clip OGDSC drops to 0 quickly, but for larger σ, the regret remains high and converges later.
Researcher Affiliation	Collaboration	1Department of Computer and Information Science, University of Pennsylvania. 2Amazon Web Services. Correspondence to: Georgy Noarov <EMAIL>.
Pseudocode	Yes	Algorithm 1 Clip OGD (Dai et al., 2023) [...] Algorithm 2 AMGAT E: Multigroup Adaptive Design [...] Algorithm 3 ASOLO: SOLO FTRL (Orabona & Pál, 2018) [...] Algorithm 4 ASOLO: Instantiation for scale-free sleeping experts [...] Algorithm 5 AOLO SE: Sleeping Experts to OLO Reduction (Orabona, 2024) [...] Algorithm 6 ASOLO SE: Sleeping Experts Algorithm [...] Algorithm 7 General Multigroup Adaptive Design
Open Source Code	Yes	Our code is available at the following link: https://github.com/amazon-science/adaptive-abtester.
Open Datasets	Yes	The second dataset comes from Egypt s largest microfinance organization (Groh & Mc Kenzie, 2016), covering 2,961 clients. [...] We also present experiments on the ASOS Digital Experiments Dataset (Liu et al., 2021), and on question-answering tasks for large language models, including Big Bench (Srivastava et al., 2023), in the Appendix.
Dataset Splits	No	The ASOS Digital Experiments Dataset (Liu et al., 2021) [...] This structure naturally creates 4 subsets of rows (each with about 6,000 rows). We treat each subset as a separate dataset and feed each of these 4 pairs of treatment and control outcome sequences into Clip OGDSC and Clip OGD0. [...] For the synthetic dataset is generated as follows: yt(i) iid N(µi, σ2) for t = 1, . . . , T and i = 0, 1 with µ0 = 1 and µ1 = 2. [...] We vary σi R+ to showcase where our method succeeds and where it struggles.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments. It mentions 'simulation' but no GPU/CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers for reproducing the experiments.
Experiment Setup	Yes	Hyperparameter Choices Throughout the experiments, we use the following hyperparameters. For our method, we set ηt = 2/t and δt = 1/h(t), where the clipping function is h(t) = exp (log(t + 2))1/4 . For Clip OGD0, we follow Dai et al. (2023) with a constant learning rate ηt = 1/ T and clipping rate δt = 0.5 t 1/ 5 log T .