Ancestral Gumbel-Top-k Sampling for Sampling Without Replacement
Authors: Wouter Kool, Herke van Hoof, Max Welling
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the experiments and results. In our first experiment, we analyze different methods for sampling without replacement: Ancestral Gumbel-Top-k sampling (Section 3), where we experiment with different values of m to control the paralellizability of the algorithm. Rejection sampling (Section 4.4) which generates samples with replacement (using standard ancestral sampling) sequentially and rejects duplicates. We also implement a parallel version of this, that generates m samples with replacement in parallel, then rejects the duplicates and repeats this procedure until k unique samples are found. Na ıve ancestral sampling without replacement (Section 4.5). This algorithm is inherently sequential, but we also implement a na ıve parallelizable version similar to rejection sampling. |
| Researcher Affiliation | Collaboration | Wouter Kool EMAIL University of Amsterdam P.O. Box 19268, 1000GG, Amsterdam, The Netherlands ORTEC Houtsingel 5, 2719EA, Zoetermeer, The Netherlands Herke van Hoof EMAIL University of Amsterdam Max Welling EMAIL University of Amsterdam, CIFAR |
| Pseudocode | Yes | Algorithm 1 Ancestral Gumbel Topk Sampling(pθ, k, m) |
| Open Source Code | Yes | 4. Our code is available at https://github.com/wouterkool/stochastic-beam-search. |
| Open Datasets | Yes | We use the pretrained model from Gehring et al. (2017) and use the wmt14.v2.en-fr.newstest2014 test set5 consisting of 3003 sentences. 5. Available at https://s3.amazonaws.com/fairseq-py/data/wmt14.v2.en-fr.newstest2014.tar.bz2. |
| Dataset Splits | Yes | We use the pretrained model from Gehring et al. (2017) and use the wmt14.v2.en-fr.newstest2014 test set5 consisting of 3003 sentences. |
| Hardware Specification | No | No specific hardware details are provided in the paper. The text mentions "the number of GPUs3 (or parallel processors on a single GPU)" but does not specify the models, quantity, or other relevant specs for their experiments. |
| Software Dependencies | No | The paper mentions using "fairseq (Ott et al., 2019)" but does not provide a specific version number for this software dependency, which is necessary for reproducibility. |
| Experiment Setup | Yes | For Sampling and Stochastic Beam Search, we control the diversity of samples generated using the softmax temperature τ (see Equation 2) used to compute the model probabilities. We use τ = 0.1, 0.2, ..., 0.8, where a higher τ results in higher diversity. Heuristically, we also vary τ for computing the scores with (deterministic) Beam Search. The diversity of Diverse Beam Search is controlled by the diversity strengths parameter, which we vary between 0.1, 0.2, ..., 0.8. We set the number of groups G equal to the sample size k, which Vijayakumar et al. (2018) reported as the best choice. ... We use lower temperatures and experiment with τ = 0.05, 0.1, 0.2, 0.5. We then use different methods to estimate the BLEU score: Monte Carlo (MC), using Equation (20). Stochastic Beam Search (SBS), where we compute estimates using the estimator in Equation (21) and the normalized variant in Equation (23). Beam Search (BS), where we compute a deterministic beam S (the temperature τ affects the scoring) and compute P y S pθ(y|x)f(y). ... for temperatures τ = 0.05, 0.1, 0.2, 0.5 and sample sizes k = 1 to 250. |