reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis

Authors: Luofeng Liao, Christian Kroer, Sergei Leonenkov, Okke Schrijvers, Liang Shi, Nicolas Stier-Moses, Congshan Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of this approach on real experiments on advertising markets at Meta. Then, we formally study interference that derives from such experimental designs, using the ﬁrst-price pacing equilibrium framework as our model of market equilibration. We propose a debiased surrogate that eliminates the ﬁrst-order bias of FPPE, and derive a plug-in estimator for the surrogate and establish its asymptotic normality. We then provide an estimation procedure for submarket parallel budget-controlled A/B tests. Finally, we present numerical examples on semi-synthetic data, conﬁrming that the debiasing technique achieves the desired coverage properties.
Researcher Affiliation	Collaboration	Luofeng Liao Christian Kroer Columbia University EMAIL Sergei Leonenkov Ads Online Experimentation, Meta EMAIL Okke Schrijvers Liang Shi Nicolas Stier-Moses Congshan Zhang Central Applied Science, Meta EMAIL
Pseudocode	Yes	Algorithm 1: Debiasing procedure
Open Source Code	No	The paper does not provide concrete access to source code. It does not contain any explicit statements about releasing code, nor does it include links to a code repository.
Open Datasets	No	In the semi-synthetic experiments, we simulate 40 buyers and 10000 good items in two submarkets, with a varying number of bad items (up to 5000) in order to study the effect of the contamination parameter α. For each α, we randomly sample a budget for each buyer, and compute β and REV from the limit pure market M0 with a value function in each submarket. Both the budget and values are sampled from historical bidding data, making the budget and value distributions heavy-tailed as in the real-world applications. More speciﬁcally, we ﬁrst sample a certain number of auctions. For each auction, we sample a given number of advertisers with their per-impression bids. Advertisers that are sampled across different auctions are treated as the same buyers and their budgets are determined by aggregating their values over auctions up to a scalar to calibrated to get the percentage of budget-constrained buyers equal to what was observed in the real-world auction market, along the same lines as the experiments of Conitzer et al. (2022b).
Dataset Splits	No	The paper describes generating semi-synthetic data by simulating buyers and items and sampling budgets/values from historical bidding data. It mentions running 100 simulations but does not provide specific training/test/validation dataset splits for a static dataset in the traditional sense, or references to predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library names with versions like PyTorch 1.9, or programming language versions like Python 3.8) needed to replicate the experiment.
Experiment Setup	Yes	In the semi-synthetic experiments, we simulate 40 buyers and 10000 good items in two submarkets, with a varying number of bad items (up to 5000) in order to study the effect of the contamination parameter α. For each α, we randomly sample a budget for each buyer, and compute β and REV from the limit pure market M0 with a value function in each submarket. Both the budget and values are sampled from historical bidding data, making the budget and value distributions heavy-tailed as in the real-world applications. More speciﬁcally, we ﬁrst sample a certain number of auctions. For each auction, we sample a given number of advertisers with their per-impression bids. Advertisers that are sampled across different auctions are treated as the same buyers and their budgets are determined by aggregating their values over auctions up to a scalar to calibrated to get the percentage of budget-constrained buyers equal to what was observed in the real-world auction market, along the same lines as the experiments of Conitzer et al. (2022b). Next, we check the coverage of the proposed variance estimator. For each α and each budget sample, we run 100 simulations in the following way: We sample items (or their values for each buyer) considering two submarkets and bad items. We then run the ﬁnite FPPE with bad items and obtain a baseline estimate for pacing multiplier and revenue without applying the debiasing procedure. Then, we apply the debiasing procedure to compute the debiased estimates.