Learning to Steer Learners in Games

Authors: Yizhou Zhang, Yian Ma, Eric Mazumdar

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical experiments to illustrate the effectiveness of the algorithms in Appendix F. F. Numerical Experiments F.1. Empirical Simulations for Section 6.1 F.2. Empirical Simulations for Section 6.2
Researcher Affiliation Academia 1Department of Computing & Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA 2Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA 92093, USA. Correspondence to: Yizhou Zhang <EMAIL>.
Pseudocode Yes Algorithm 1 Playing Against Ascending Learner ... Algorithm 2 test ... Algorithm 3 Binary Search ... Algorithm 4 Playing Against Mirror Descent ... Algorithm 5 Explore Row
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets No The paper describes constructed game instances for its numerical experiments, such as 'Matching pennies' and 'Constructed game instance 1/2', rather than utilizing publicly available datasets. No links or citations to open datasets are provided.
Dataset Splits No The paper conducts numerical simulations using constructed game instances, rather than working with traditional datasets that would require explicit training/test/validation splits.
Hardware Specification No The paper discusses 'Numerical Experiments' and 'Empirical Simulations' but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run these experiments.
Software Dependencies No The paper mentions methods like 'Online Gradient descent (OGD)' and 'Stochastic Mirror descent with KL regularizer', and refers to a step size 'ηt = η0/√t', but it does not specify any software libraries, frameworks, or their version numbers that were used for implementation.
Experiment Setup Yes For Binary Search, we set the accuracy margin d = 0.01. For each pure strategy of the optimizer, we set the number of steps for exploration to be k = 50. For all experiments in this section, we assume the learner is using Online Gradient descent (OGD) with step size ηt = η0/√t. For the purpose of properly displaying the interaction and learning process, we choose different η0 for different game instances.