Competing Bandits in Matching Markets via Super Stability

Authors: Soumya Basu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We numerically study the behavior of the proposed centralized algorithm. Due to lack of space we defer the details of our experiment in Appendix B. As the decentralized algorithm is guaranteed to have only a regret O(N 2) away from the centralized one, we omit the decentralized algorithm. We compare against centralized Explore-then-Gale Shapley (ETGS) algorithm for fair comparison.4 Extended-GS significantly outperforms ETGS by quickly identifying a viable partial ranking during the exploration phase. This leads to lower regret than ETGS, which needs a full ranking of the top N items before achieving comparable performance. Our experiments confirm the superior performance of our proposed algorithm compared to existing Gale-Shapley-based approaches.
Researcher Affiliation Industry 1Google, New York. Correspondence to: Soumya Basu <EMAIL>.
Pseudocode Yes Algorithm 1: Centralized Two-Sided Matching Bandit Algorithm 2: Extended Gale Shapley (EXTENDED-GS) Algorithm 3: Decentralized Two-Sided Matching Bandit, User i Algorithm 4: Decentralized Two-Sided Matching Bandit, Arm j
Open Source Code No The paper discusses experimental results and comparisons but does not explicitly state that source code is provided or offer a link to a repository for the methodology described.
Open Datasets No We generate a bandit instance by selecting a random preference order for each user and each arm. We generate the mean rewards µi,j and γj,i, for i [N] and j [K] such that the permutation is randomized, and the gaps are chosen uniformly at random from [ min, max]. For each reward observation we apply independent Gaussian noise with standard deviation σ (the UCB and LCB bonuses are also scaled with σ).
Dataset Splits No The paper describes generating synthetic bandit instances with randomized permutations and randomly chosen gaps, rather than using a predefined dataset with explicit train/test/validation splits. Therefore, specific dataset split information is not applicable in the traditional sense.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments or numerical studies.
Software Dependencies No The paper describes algorithms and methods (e.g., UCB-LCB based rank recovery, Extended-GS algorithm) but does not provide specific software names with version numbers or list any programming languages or libraries used for implementation.
Experiment Setup Yes In Figure 1, we consider one such instance with N = 8 users, M = 8 arms, min = 0.2, max = 0.5, and σ = 0.4. We observe the logarithmic pessimal stable regret for both the algorithms. However, clearly the proposed Extended-GS based algorithm outperforms the Centralized ETGS adapted from Kong et al. (Kong & Li, 2023). We plot the mean rewards, as well as 25% and 75% regret plots averaged over 20 sample paths. Next, in Figrue 2 we study the dependence on σ/ avg by varying σ for N = K = 5 and min = 0.2, max = 0.5. We observe that there is a clear separation between the Centralized ETGS and proposed Extended-GS algorithm showing the superiority of the latter. Both the algorithms initially explore but Extended-GS algorithm finds an admissible partial rank much earlier than the top N rank resolution leading to a lower regret than ETGS. Figrue 3 exhibits similar behavior with varying gaps min and max.