Multiplayer Information Asymmetric Contextual Bandits
Authors: William Chang, Yuanhao Lu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we execute simulations to corroborate the empirical efficacy of the proposed algorithms in this paper. In Figure 1, we plot the regret versus time for both algorithms Lin UCB-A and Lin UCB-B. |
| Researcher Affiliation | Academia | William Chang EMAIL Department of Mathematics, UCLA, Los Angeles, CA, USA Yuanhao Lu EMAIL Princeton University, Princeton, NJ, USA |
| Pseudocode | Yes | Algorithm 1 Lin UCB-A for asymmetry in actions... Algorithm 2 Lin UCB-B for asymmetry in rewards... Algorithm 3 ETC for asymmetry in rewards and actions |
| Open Source Code | Yes | All the source code that has been used to generate the results presented in this paper can be found via http://tinyurl. com/yty68wcp. |
| Open Datasets | No | We conduct the simulations using θ and context vectors x uniformly sampled from the unit cube [0, 1]. This parametrization ensures that θ ℓ2 and x ℓ2, measured using the ℓ2 norm, does not exceed L = d, in line with the constraints of our problem setting. Furthermore, it s clear this uniform distribution is bounded over our space for x. Each reward is set to be Gaussian, and the standard deviation of them is randomly uniformly pre-selected to be from the range [0, 1]. |
| Dataset Splits | No | We conduct the simulations using θ and context vectors x uniformly sampled from the unit cube [0, 1]. This implies synthetic data generation, not traditional dataset splits. |
| Hardware Specification | No | The paper describes the simulation environment and parameters, but does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper describes the algorithms and experimental setup but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries). |
| Experiment Setup | Yes | For each environment, the simulations were executed over T = 10, 000 rounds. We repeat these simulations 5 times to compute the median regret and report the 95% confidence interval. The hyperparameter βT is set to T for all algorithms analyzed. In the proceeding section, we perform the experiment on environments with m and K equal to 2, 3, 4 respectively, with d = 5, 10. Moreover, we use Lin UCB-B_ETC to denote the ETC algorithm run on problem B. Similarly, Lin UCB-C_ETC is used to denote the ETC algorithm run on problem C.3 |