reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sampling from Binary Quadratic Distributions via Stochastic Localization

Authors: Chenguang Wang, Kaiyuan Cui, Weichen Zhao, Tianshu Yu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on instances with quadratic unconstrained binary objectives, including maximum independent set, maximum cut, and maximum clique problems, demonstrate consistent improvements in sampling efficiency across different discrete MCMC samplers. [...] To demonstrate the practical value of our theoretical insights, we conduct extensive experiments on three types of QUBO problems: maximum independent set, maximum cut, and maximum clique problems, each evaluated across multiple datasets.
Researcher Affiliation	Academia	1School of Data Science, The Chinese University of Hong Kong, Shenzhen 2Shenzhen Research Institute of Big Data 3The Academy of Mathematics and Systems Science, Chinese Academy of Sciences 4The School of Statistics and Data Science, LPMC & KLMDASR, Nankai University. Correspondence to: Weichen Zhao <EMAIL>, Tianshu Yu <EMAIL>.
Pseudocode	Yes	Algorithm 1 SL with Discrete MCMC Samplers
Open Source Code	Yes	The code is available at https://github.com/LOGO-CUHKSZ/SLDMCMC.
Open Datasets	Yes	Following the DISCS benchmark (Goshvadi et al., 2024), we evaluate on three types of combinatorial optimization problems with the form of binary quadratic distributions: maximum independent set, maximum cut, and maximum clique problems, each containing multiple diverse datasets (14 datasets in total). Detailed information about these problems and datasets is provided in Appendix D.1. ... The evaluation is conducted on five datasets: four from Erdos-Renyi random graphs with different densities {0.05, 0.10, 0.15, 0.25} and one from SATLIB. ... we evaluate on seven datasets: three from different sizes of Erdos-Renyi random graphs ... three from different sizes of Barabasi-Albert random graphs ... and one from Optsicom. ... We evaluate on two datasets: Twitter and RB.
Dataset Splits	No	The paper evaluates on a collection of 14 diverse datasets/problem instances for combinatorial optimization problems. It does not describe a process of splitting a single dataset into distinct training, validation, and test sets in the traditional machine learning sense for model development and evaluation.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A40 GPU and an AMD EPYC 7513 CPU.
Software Dependencies	No	The paper mentions using specific discrete MCMC samplers: Gibbs with Gradients (GWG) (Grathwohl et al., 2021), Path Auxiliary Sampler (PAS) (Sun et al., 2021), and Discrete Metropolis-Adjusted Langevin Algorithm (DMALA) (Zhang et al., 2022). It also refers to a balancing function g(t) = sqrt(t). However, no specific version numbers for these tools, programming languages (e.g., Python), or underlying libraries (e.g., PyTorch, TensorFlow) are provided.
Experiment Setup	Yes	Hyperparameters are tuned through comprehensive search. Specifically, we tune the following parameters: the number of discretization steps K {256, 512, 1024}, initial and final noise scales ϵ, ϵend [10-4, 10-1], MCMC sample ratio (proportion of last samples used for posterior estimation) in [0.1, 1] with step 0.1, MCMC step exponential decay rate r in [10-4, 10-1], minimum MCMC steps Nmin {2, 4, 6, 8, 16, 32}, and noise parameter σ {1, 2, 4, 6, 8, 10, 15, 20}. The total number of MCMC steps is fixed at 10,000. ... we use adaptive step sizes with a target acceptance rate of 0.574 and balancing function of g(t) = sqrt(t).