reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hamiltonian Monte Carlo with Energy Conserving Subsampling

Authors: Khue-Dung Dang, Matias Quiroz, Robert Kohn, Minh-Ngoc Tran, Mattias Villani

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 6, titled "Applications", describes the practical application of the proposed methods on two large datasets (HIGGS and Bankruptcy data), comparing their performance with alternative approaches. This involves reporting metrics, figures (e.g., Figures 1-6), and tables (e.g., Tables 1-4) of experimental results.
Researcher Affiliation	Academia	All authors are affiliated with universities: University of New South Wales, University of Technology Sydney, University of Sydney, Linköping University, and Stockholm University. Their email addresses also correspond to these academic institutions (e.g., @unsw.edu.au, @sydney.edu.au, @liu.se).
Pseudocode	Yes	Algorithm 1: One iteration of HMC-ECS. is presented on page 5, providing structured steps for the algorithm.
Open Source Code	No	The paper does not provide an explicit statement about releasing code or a link to a code repository. The license information provided (CC-BY 4.0) pertains to the paper itself, not the source code for the methodology.
Open Datasets	Yes	The paper uses the HIGGS data set, citing Baldi et al. (2014), and the Bankruptcy data, citing Giordani et al. (2014). These are established datasets in their respective fields, and the citations provide concrete access information to the data's origin.
Dataset Splits	Yes	For the HIGGS data, the paper states: "We use 10.5 millions observations for training and 500,000 for testing." This provides specific sample counts for training and testing splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing specifications. General statements like "high-dimensional" do not constitute hardware specifications.
Software Dependencies	No	The paper mentions using the "CODA package in R (Plummer et al., 2006)" but does not specify the version numbers for R or the CODA package. Other software used is implied but not explicitly versioned.
Experiment Setup	Yes	Section 6.3 and 6.4 provide extensive details on the experimental setup, including: "ϵL = 1.2" for trajectory length, desired acceptance rate "δ = 0.8", and resulting step size "ϵ = 0.2 and L = 6". It also specifies "ρ = 0.99" for subsamples, mini-batch size "m = 30" and "λ = 100" or "λ = 200" for signed HMC-ECS, and specific ϵ and L values for SGLD and SG-HMC in sections 6.5 and 6.6 (e.g., "ϵ = 0.06 and ϵ = 0.000001 are used, respectively. The resulting L is 20 for SG-HMC.").