Herded Gibbs Sampling

Authors: Yutian Chen, Luke Bornn, Nando de Freitas, Mareija Eskelin, Jing Fang, Max Welling

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem. We illustrate the performance of herded Gibbs with two synthetic examples and two real experiments for image denoising and natural language processing respectively.
Researcher Affiliation Collaboration Yutian Chen EMAIL 7 Pancras Square, Kings Cross, London, N1C 4AG, United Kingdom Luke Bornn EMAIL Statistics & Actuarial Science, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A1S6, Canada Nando de Freitas EMAIL 7 Pancras Square, Kings Cross, London, N1C 4AG, United Kingdom Mareija Eskelin EMAIL Dept of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC, V6T1Z4, Canada Jing Fang EMAIL 1 Facebook Way, Menlo Park, CA, 94025, United States Max Welling EMAIL Informatics Institute, Science Park 904, Postbus 94323, 1090 GH, Amsterdam, Netherlands
Pseudocode Yes Algorithm 1 Herded Gibbs Sampling
Open Source Code Yes 1. Code is available at http://www.mareija.ca/research/code/
Open Datasets Yes We used the pre-trained 3-class CRF model in the Stanford NER package (J. R. Finkel and Manning, 2005). For the test set, we used the corpus for the NIST 1999 IE-ER Evaluation.
Dataset Splits No The paper mentions using a 'test set' for NER (NIST 1999 IE-ER Evaluation) and '10 corrupted images' for image denoising, but does not provide specific training/validation splits or other detailed partitioning methodology needed for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like the 'Stanford NER package' but does not specify version numbers for any key software components or libraries.
Experiment Setup Yes The L2 reconstruction errors as a function of the number of iterations, for this example, are shown in Figure 9. The plot compares the herded Gibbs method against Gibbs and two versions of mean field with different damping factors (Murphy, 2012). Table 2: Errors of image denoising example after 30 iterations (all measurements have been scaled by 10-3). Table 3: F1 scores for Gibbs, herded Gibbs and Viterbi on the NER task. The average computational time each approach took to do inference for the entire test set is listed (in square brackets). After only 400 iterations (90.48 seconds), herded Gibbs already achieves an F1 score of 84.75, while Gibbs, even after 800 iterations (115.92 seconds) only achieves an F1 score of 84.61.