Bandit Learning in Matching Markets with Indifference

Authors: Fang Kong, Jingqi Tang, Mingzhu Li, Pinyan Lu, John C.S. Lui, Shuai Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the algorithm s effectiveness in handling such complex situations and its consistent superiority over baselines. Extensive experiments are conducted to show our algorithm s effectiveness and consistent advantage compared with available baselines. We report the stable regret of each player in Figure 1 (a)(b)(c)(d)(e) and the cumulative market unstability (the cumulative number of unstable matchings) in Figure 1 (f).
Researcher Affiliation Academia EMAIL. Southern University of Science and Technology EMAIL. Shanghai Jiao Tong University EMAIL. Shanghai Jiao Tong University EMAIL. Shanghai University of Finance and Economics; Key Laboratory of Interdisciplinary Research of Computation and Economics (SUFE), Ministry of Education EMAIL Chinese University of Hong Kong EMAIL. Shanghai Jiao Tong University.
Pseudocode Yes Algorithm 1 adaptive exploration with arm-guided GS (AE-AGS, centralized version, from the view of the central platform) ... Algorithm 2 Subroutine-of-AE-AGS ... Algorithm 3 AE-AGS (centralized version, from the view of player pi) ... Algorithm 4 AE-AGS (decentralized version, from the view of player pi) ... Algorithm 5 Communication
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology, nor does it include a link to a code repository or mention code in supplementary materials.
Open Datasets No To present the stable regret of each player, we first test the algorithms performances in a small market with 5 players and 5 arms. The position of each arm in a player s preference ranking is a random number in {1, 2, . . . , K}, similar to how the arms rank the players. Arms sharing the same position in a ranking have the same preference values, and the preference gap between two arms ranked in adjacent positions is set to = 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1.
Dataset Splits No The paper describes a simulation setup where data is generated for each run. It specifies parameters for this generation (e.g., 'random number in {1, 2, ..., K}', 'Gaussian distribution') but does not refer to traditional dataset splits like training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only generally states that 'Extensive experiments are conducted'.
Software Dependencies No The paper describes the algorithms and experimental methodology but does not mention any specific software, libraries, or their version numbers used for implementation or simulation.
Experiment Setup Yes In each experiment, we run all algorithms for T = 100k rounds and report the averaged results over 20 independent runs. The position of each arm in a player s preference ranking is a random number in {1, 2, . . . , K}, similar to how the arms rank the players. The preference gap between two arms ranked in adjacent positions is set to = 0.1. The feedback Xi,j(t) for player pi on arm aj at time t is drawn independently from the Gaussian distribution with mean µi,j and variance 1. We also vary the value of {0.1, 0.15, 0.2, 0.25} and market size N = K {3, 6, 9, 12} to show the performances of algorithms.