reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hierarchical Neural Simulation-Based Inference Over Event Ensembles

Authors: Lukas Heinrich, Siddharth Mishra-Sharma, Chris Pollard, Philipp Windischhofer

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments We describe several case studies of dataset-wide learning, ranging from illustrative toy experiments to more prototypical examples representing problems in particle physics and astrophysics. We refer to App. A for additional details on the experiments, including details on training.
Researcher Affiliation	Academia	Lukas Heinrich EMAIL Technical University of Munich Siddharth Mishra-Sharma EMAIL MIT, Harvard University, IAIFI Chris Pollard EMAIL University of Warwick Philipp Windischhofer EMAIL University of Chicago
Pseudocode	No	The paper describes methods and architectures in Section 4 and illustrates a deep set-based architecture in Figure 1. However, it does not contain explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper lists several software packages used (Einops, Jax, Jupyter, Matplotlib, nflows, Numpy, PyMC5, PyTorch, PyTorch Lightning, Scipy) but does not provide an explicit statement or link for the open-source release of the code developed for this work.
Open Datasets	No	The experiments described use data generated from various forward models (e.g., simple multivariate normal likelihood, mixture models in particle physics, strong gravitational lensing model) rather than utilizing pre-existing public datasets with explicit access information. For instance, in Section 5.1, it states: '50,000 samples drawn from this likelihood'.
Dataset Splits	Yes	The deep set as well as transformer multivariate normal posterior estimators are trained on 50,000 sequences {x} with a batch size of 128, withholding 10% of the samples for validation. ... Evaluation is performed on 500 new test samples...
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The Acknowledgments section lists several software packages including 'Py MC5 (Salvatier et al., 2016), Py Torch (Paszke et al., 2019), Jax (Bradbury et al., 2018), and others'. While 'Py MC5' implies a version number, most other listed packages only provide a citation to their respective papers, not specific software version numbers (e.g., 'Py Torch 1.9') that would be required for full reproducibility.
Experiment Setup	Yes	The deep set as well as transformer multivariate normal posterior estimators are trained on 50,000 sequences {x} with a batch size of 128... The estimators are trained using the Adam W ... optimizer with initial learning rate 3 10 4 and cosine annealing over 100 epochs. ... The experiment is trained with a batch size of 16 ..., using the Adam W optimizer with cosine-annealed learning rate starting at 3 10 4 for up to 100 epochs with early stopping...