Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

Authors: Yu Zhang, Shanshan Zhao, Bokui Wan, Jinjuan Wang, Xiaodong Yan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power. In this section, we conduct detailed comparisons between the proposed method and other state-of-the-art methods via synthetic data (Section 5.1) and real-world data (Section 5.2).
Researcher Affiliation Collaboration 1Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, China 2School of Mathematics, Shandong University, Jinan, China 3Didi Chuxing, Beijing, China 4School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China 5School of Mathematics and Statistics, Xi an Jiaotong University, Xian, China.
Pseudocode Yes Algorithm 1 Permuted WTAB algorithm Input: data D = {(Xi, Yi, Ai), i = 1, . . . , n}, threshold τ, permutation times B. Output: the aggregated p-value pa.
Open Source Code No The text does not include an unambiguous statement where the authors state they are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets No In this section, the application of the proposed method is demonstrated through an analysis of three real data sets obtained from a world-leading ride-sharing company. Due to privacy considerations, we refer to them as data sets A, B, and C. Given that real-world data distributions are often difficult to replicate using purely synthetic data, we additionally construct a semi-synthetic dataset based on real-world data. Following the approach proposed in (Kohavi et al., 2020), we generate synthetic data based on real-world observations.
Dataset Splits Yes In practice, the observation dataset D is divided into K equal subsets Dk. For each Dk, a Light GBM model is trained on the remaining data D/Dk and applied to estimate counterfactual results of Dk. This procedure is repeated for all subsets Dk. The simulations in Section 5 show that a good performance is achievable with K = 2.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No Specifically, Light GBM (Ke et al., 2017), a state-of-the-art gradient boosting algorithm, is employed within the double machine learning (DML) framework (Chernozhukov et al., 2018). Additionally, XGBoost (Chen & Guestrin, 2016) is used to estimate m1(x), m0(x), and e(x), exhibiting similar performance to Light GBM. The paper mentions software names but does not provide specific version numbers for the libraries or packages used in their experiments.
Experiment Setup Yes Specifically, first determine a number of permutations, denoted as B. Subsequently, the sequence {1, 2, . . . , n} is reordered into a new one by applying a mapping πb : {1, 2, . . . , n} {1, 2, . . . , n}. For each element i in the original sequence, its position in the reordered sequence is given by πb(i). For b = 1, . . . , B, the mapping πb is applied to the counterfactual outcomes {bµi, i = 1, . . . , n}, resulting in reordered samples {bµπb(i), i = 1, . . . , n}. In this paper, B is set to 25, as it has been observed that increasing B further does not substantially improve statistical power. The simulations in Section 5 show that a good performance is achievable with K = 2. the sample size is fixed to n = 20000. a threshold (typically 0.03) is selected to regulate the magnitude of λ, that is, to identify the largest that satisfies λσ/ (1 λ) n 0.03.