DiMa: Understanding the Hardness of Online Matching Problems via Diffusion Models

Authors: Boyu Zhang, Aocheng Shen, Bing Liu, Qiankun Zhang, Bin Yuan, Jing Wang, Shenghao Liu, Xianjun Deng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first examine Di Ma on the classic OBM problem by reproducing its known hardest input instance in literature. Further, we apply Di Ma to two well-known variants of OBM, for which the exact hardness remains an open problem, and we successfully improve their theoretical state-of-the-art upper bounds. Section 5. Experiments: In this section, we first show how to apply Di Ma to the classic (fractional) OBM and reproduce the known best 1 - 1/e upper bound by generating the known hardest upper-triangular graph instance (Section 5.1). Further, we study two important variants of OBM, named OBM with random arrivals (Section 5.2) and OBM with stochastic rewards (Section 5.3). Both problems are still open-ended, meaning that their best upper bounds remain unknown. We improve their state-of-the-art by obtaining harder instances by Di Ma.
Researcher Affiliation Academia 1School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 2Key Laboratory of Cyberspace Security, Ministry of Education, Zhengzhou, China 3Hubei Key Laboratory of Distributed System Security, Wuhan, China 4Songshan Laboratory, Zhengzhou, China 5Visiting researcher with the Lion Rock Labs of Cyberspace Security, CTl HE, Hong Kong, China 6School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China. Correspondence to: Qiankun Zhang <EMAIL>.
Pseudocode Yes Algorithm 1 Shortcut Policy Gradient Algorithm
Open Source Code Yes Our code is provided in the Supplementary Material.
Open Datasets No Choice of q in DDPM. In this problem, q is initialized by a thick-z distribution constructed as follows. We construct a set of the thick-z distribution that comprises 200 graphs by adding randomness to the thick-z graph instance as presented in Figure 2, which is known to be difficult (but not the worst) for OBM problems. Instances are obtained by independently flipping each edge in thick-z with a probability of γ = 0.25.
Dataset Splits No The paper does not utilize fixed datasets with explicit train/test/validation splits. Instead, it describes how instances are generated for pre-training and fine-tuning. For example, it mentions: 'In the pre-training process, the total number of time steps is set to T = 100, the training batch size is set to 4, and the training epoch is set to 1000.' These are model training parameters, not dataset split information.
Hardware Specification No Table 1. The memory size and running time using SGP during the fine-tuning process when the size of the instance is set to |L| = |R| = 12, 20, 40, 80. Method Graph size Memory size Running time SPG 12x12 0.62GB 20s/epoch SPG 20x20 1.14GB 25s/epoch SPG 40x40 5.05GB 35s/epoch SPG 80x80 22.59GB 45s/epoch Traditional Computation 20x20 >24GB. This table provides memory usage and running times but does not specify particular CPU or GPU models used for the computations.
Software Dependencies No DDPM structure. The denoising model is structured with diffusion transformer (Di T) (Peebles & Xie, 2023). Di T is based on the vision transformer (Vi T), retaining most of its configurations. The paper describes the model architecture and underlying concepts (Di T, Vi T) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Other hyper-parameters. The size of the problem instance I is set to |L| = |R| = 20, 40. In the pretraining process, the total number of time steps is set to T = 100, the training batch size is set to 4, and the training epoch is set to 1000. In the fine-tuning process, the total number of training epochs is set to E = 100 with N = 100 trajectories sampled per epoch and the step size k is set to k = 100. Reward function. In the fine-tuning process using SPG, the reward function r(I) is defined as the λ(1 − CR(I)), where CR(I) = ALG(I)/OPT(I) and λ = 5 is a hyperparameter that controls the magnitude of the reward.