Solving hidden monotone variational inequalities with surrogate losses

Authors: Ryan D'Orazio, Danilo Vucetic, Zichu Liu, Junhyung Lyle Kim, Ioannis Mitliagkas, Gauthier Gidel

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient. 5 EXPERIMENTS We consider min-max optimization and policy evaluation to demonstrate the behaviour and convergence of surrogate based methods with hidden monotone structure. For min-max optimization we compare different approaches from Section 4 on two domains from Sakos et al. (2024): hidden matching pennies and hidden rock-paper scissors. For RL policy evaluation we consider the problem of minimizing projected Bellman error (PBE) with linear and non-linear approximation. Figure 1: Convergence of various algorithms from Section 4 on the hidden matching pennies game. Figure 2: Convergence in the hidden rps game. Figure 3: The average approximation error between GD on the surrogate (14, Surr-GD) and update (13) over 10,000 runs in a slow mixing 100-state Markov chain from Bertsekas (2009) and Yu & Bertsekas (2009, Example 3). Figure 4: Comparison of average performance of TD(0) and surrogate methods in minimizing the value prediction error for RL tasks with nonlinear function approximation in Ant (top) and Half Cheetah (bottom) environments, measured by outer loop iterations (left) and wallclock time (right).
Researcher Affiliation Academia Ryan D Orazio, Danilo Vucetic & Zichu Liu Mila Qu ebec AI Institute, Universit e de Montr eal Montr eal, QC, Canada EMAIL Junhyung Lyle Kim Department of Computer Science, Rice University Houston, TX, USA EMAIL Ioannis Mitliagkas & Gauthier Gidel Mila Qu ebec AI Institute, Universit e de Montr eal, CIFAR AI Chair Montr eal, QC, Canada EMAIL
Pseudocode Yes Algorithm 1: α-descent on surrogate Input: Outer loop interactions T, initial parameters θ1 Rd, step size η for surrogate loss, α (0, 1), optimizer update A : L Rd Rd. for t = 1 to T do Compute VI operator: F(g(θt)) Set surrogate loss: ℓt(θ) = 1 2 g(θ) (g(θt) ηF(g(θt)) 2 θs θt while ℓt(θs) ℓ t > α2(ℓt(θt) ℓ t ) do Update parameters with optimizer: θs A(ℓt, θs) θt+1 θs return θT +1 Algorithm 2: Surrogate Method with Inner Loop Algorithm 3: Surrogate Method with Double Sampling Algorithm 4: Thresholded Surrogate Method
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes For our RL experiments we consider the policy evaluation problem of approximating a value vector vπ Rn, representing the expected discounted return over n states for a given policy π and Markov decision process (MDP). ... For our linear RL experiments, we consider a slow-mixing 100-state Markov chain from Bertsekas (2009) and Yu & Bertsekas (2009). For the non-linear setting, we approximate the value function of a fixed reference policy with a two-layer neural network in two Mujoco environments, Ant and Half Cheetah (Todorov et al., 2012).
Dataset Splits No All methods are evaluated with value prediction error, the squared error for a test sample of 10,000 states visited by a fixed reference policy. ... In this setting, n tuples of state, reward and next state are collected {(st, rt, st+1)}n t=1 from a fixed reference policy π. ... For the "Mujoco environments", it mentions "a fixed test set" for evaluation but no explicit train/val splits are detailed. It does not provide percentages or specific counts for train/validation.
Hardware Specification No The paper does not contain specific hardware details such as CPU or GPU models used for experiments.
Software Dependencies No The paper mentions "ADAM (Kingma, 2014)" as an optimizer but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers. The mention of specific versions of other solvers or tools is also absent.
Experiment Setup Yes Input: Outer loop interactions T, initial parameters θ1 Rd, step size η for surrogate loss, α (0, 1), optimizer update A : L Rd Rd. ... In Figure 1, we observe that PHGD converges linearly in the squared distance to the equilibrium, but performs poorly if multiple inner steps are taken (GN with 5 inner steps). If η is increased by an order of magnitude, PHGD is observed to diverge; however, convergence is possible via DGN with the same η, with multiple iterations and an appropriate choice of ηDGN. In contrast to GN, LM is more stable with a larger η and converges faster than PHGD/GN. Finally we tested GD for a different number of inner steps with η = 0.1. In practice we approximate the infimum in d BE by 500 steps of Adam W, with a learning rate of 1 10 4 and exponential annealing at a decay rate of γ = 0.995. Algorithms 2, 3, and 4 list parameters such as "learning rate η", "number of inner steps M", and "discount factor γ".