Learning Bayesian Nash Equilibrium in Auction Games via Approximate Best Response
Authors: Kexin Huang, Ziqian Chen, Xue Wang, Chongming Gao, Jinyang Gao, Bolin Ding, Xiang Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various auction formats demonstrate that our approach accelerates convergence and enhances learning efficiency in complex auction settings. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China, Hefei, China 2Independent Researcher. Correspondence to: Chongming Gao <EMAIL>, Xiang Wang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Stochastic Approximate BR Gradient Method |
| Open Source Code | Yes | Our code is open-sourced at: https://github.com/Hesse73/Approx-BR-Grad. |
| Open Datasets | No | We benchmark the L2 distance in the symmetric auction games with a uniform prior under first-price and second-price mechanisms... We also explore the asymmetric auction setting, where bidders have different value priors. Following previous work on learning asymmetric auction games (Bichler et al., 2023a), we consider the FP auction with 2 bidders whose value priors are U[0, 0.5] (the weak bidder) and U[0, 1] (the strong bidder). |
| Dataset Splits | No | For all experiments, we sample 256 values for vi and 10,000 value samples for v i. |
| Hardware Specification | No | This research was also supported by the advanced computing resources provided by the Supercomputing Center of the USTC. |
| Software Dependencies | Yes | In the original SM paper, the ex-post utility is modified as: u SM i (vi, bi, b i) = (vi p SM i ) exp(bi/λ) Pn j=1 exp(bj/λ) where p SM i = P j n pj. In our implementation, we use p SM i = bi for FP and p SM i = maxj =i bj for simplicity. Note that this modification doesn’t change the smoothness and ability to approximate the original utility function of u SM i . Our results align closely with the ones reported in the SM paper: The L2 error of our results is around 3e-3 and the SM paper’s L2 is ranged in 4e-3 5e-3 when evaluated at the FP/SP symmetric uniform setting with 2 players. While improving the L2 result, the training time per iteration when n = 2 in our implementation is 0.012 0.013 seconds, which is comparable to the time range of 0.009 0.011 seconds reported in SM’s paper. |
| Experiment Setup | Yes | We conduct the experiments in a more general setting where the bidding function is a 3-layer MLP instead of the linear model in Assumption 3.1 and evaluate performance across different mechanisms, asymmetric auctions, risk-averse utilities, and alternative gradient estimation approaches. ... we use a shared MLP network for bidders with the same value prior and run each method with 2,000 steps. ... We set the learning rate to 0.05 for all experiments except for the asymmetric SP auction, where we increased the learning rate to 0.2 for better results. ... For all experiments, we sample 256 values for vi and 10,000 value samples for v i. ... For the estimation of pdf in the ex-interim utility’s gradient (Eq. (11)), we utilize the kernel density estimation (KDE) approach with a Gaussian kernel... The bandwidth h in KDE is a hyperparameter... The parameter γ in first-order approximate best response is set to 1. ... the 2nd-order approximation is not applied during the first 200 steps of training; 2. it’s only enabled if 2ui/ b2 i -1e-8 and | ui/ bi| 0.01 by default. |