Any-scale Balanced Samplers for Discrete Space
Authors: Haoran Sun, Bo Dai, Charles Sutton, Dale Schuurmans, Hanjun Dai
ICLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On various synthetic and real distributions, the proposed sampler substantially outperforms existing approaches. We conducted an experimental evaluation on three types of target distributions: 1) quadratic synthetic distributions, 2) non-quadratic synthetic distributions, and 3) real distributions. |
| Researcher Affiliation | Collaboration | Haoran Sun EMAIL Bo Dai EMAIL Charles Sutton EMAIL Dale Schuurmans EMAIL Hanjun Dai EMAIL Work done during an internship at Google. Georgia Tech Google Research, Brain Team University of Alberta |
| Pseudocode | Yes | Algorithm 1: AB sampling algorithm; Algorithm 2: AB M-H step; Algorithm 3: Adapting Algorithm; Algorithm 4: Adapting Algorithm Block |
| Open Source Code | No | No explicit statement or link to open-source code for the methodology is provided. |
| Open Datasets | Yes | For real distributions, we compare against baseline samplers on challenging inference problems in deep energy based models trained on MNIST, Omniglot, and Caltech datasets. |
| Dataset Splits | No | The paper mentions 'T=100,000 steps, with T1=20,000 burn-in steps to make sure the chain mixes.' which refers to MCMC chain length and burn-in, not explicit dataset splits (train/validation/test) with percentages or counts. For EBMs, it mentions a training framework and number of steps to obtain samples, but not explicit dataset splits. |
| Hardware Specification | Yes | All experiments are running on a virtual machine with CPU: Intel Haswell, GPU: 4 Nvidia V100, System: Debian 10. |
| Software Dependencies | Yes | In this work, we use academia version of Mosek (Ap S, 2019). |
| Experiment Setup | Yes | Input: Initial σ = 0.1, α = 0.5, W = 0, D = 0; initial x0... For each setting and sampler, we run 100 chains for T=100,000 steps, with T1=20,000 burn-in steps to make sure the chain mixes. Algorithm 3: Adapting Algorithm Input: initial σ = 0.1, α = 0.5, update rate γ = 0.2, decay rate β = 0.9, initial state x0, buffer size N = 100. |