reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Competing Bandits in Matching Markets via Super Stability

Authors: Soumya Basu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We numerically study the behavior of the proposed centralized algorithm. Due to lack of space we defer the details of our experiment in Appendix B. As the decentralized algorithm is guaranteed to have only a regret O(N 2) away from the centralized one, we omit the decentralized algorithm. We compare against centralized Explore-then-Gale Shapley (ETGS) algorithm for fair comparison.4 Extended-GS significantly outperforms ETGS by quickly identifying a viable partial ranking during the exploration phase. This leads to lower regret than ETGS, which needs a full ranking of the top N items before achieving comparable performance. Our experiments confirm the superior performance of our proposed algorithm compared to existing Gale-Shapley-based approaches.
Researcher Affiliation	Industry	1Google, New York. Correspondence to: Soumya Basu <EMAIL>.
Pseudocode	Yes	Algorithm 1: Centralized Two-Sided Matching Bandit Algorithm 2: Extended Gale Shapley (EXTENDED-GS) Algorithm 3: Decentralized Two-Sided Matching Bandit, User i Algorithm 4: Decentralized Two-Sided Matching Bandit, Arm j
Open Source Code	No	The paper discusses experimental results and comparisons but does not explicitly state that source code is provided or offer a link to a repository for the methodology described.
Open Datasets	No	We generate a bandit instance by selecting a random preference order for each user and each arm. We generate the mean rewards µi,j and γj,i, for i [N] and j [K] such that the permutation is randomized, and the gaps are chosen uniformly at random from [ min, max]. For each reward observation we apply independent Gaussian noise with standard deviation σ (the UCB and LCB bonuses are also scaled with σ).
Dataset Splits	No	The paper describes generating synthetic bandit instances with randomized permutations and randomly chosen gaps, rather than using a predefined dataset with explicit train/test/validation splits. Therefore, specific dataset split information is not applicable in the traditional sense.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments or numerical studies.
Software Dependencies	No	The paper describes algorithms and methods (e.g., UCB-LCB based rank recovery, Extended-GS algorithm) but does not provide specific software names with version numbers or list any programming languages or libraries used for implementation.
Experiment Setup	Yes	In Figure 1, we consider one such instance with N = 8 users, M = 8 arms, min = 0.2, max = 0.5, and σ = 0.4. We observe the logarithmic pessimal stable regret for both the algorithms. However, clearly the proposed Extended-GS based algorithm outperforms the Centralized ETGS adapted from Kong et al. (Kong & Li, 2023). We plot the mean rewards, as well as 25% and 75% regret plots averaged over 20 sample paths. Next, in Figrue 2 we study the dependence on σ/ avg by varying σ for N = K = 5 and min = 0.2, max = 0.5. We observe that there is a clear separation between the Centralized ETGS and proposed Extended-GS algorithm showing the superiority of the latter. Both the algorithms initially explore but Extended-GS algorithm finds an admissible partial rank much earlier than the top N rank resolution leading to a lower regret than ETGS. Figrue 3 exhibits similar behavior with varying gaps min and max.