reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating Density Models with Truncation Boundaries using Score Matching

Authors: Song Liu, Takafumi Kanamori, Daniel J. Williams

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the usefulness of our method via numerical experiments and a study on the Chicago crime data set. We also show that the proposed density estimation can correct the outlier-trimming bias caused by aggressive outlier detection methods. Section 8 is titled "Numerical and Real-world Data Analysis" and includes various experiments with datasets and performance comparisons.
Researcher Affiliation	Academia	Song Liu EMAIL University of Bristol; Takafumi Kanamori EMAIL Tokyo Institute of Technology, RIKEN AIP; Daniel J. Williams EMAIL University of Bristol. All listed affiliations are academic institutions.
Pseudocode	No	The paper describes its methodology using mathematical formulations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and data sets to reproduce our experiments are available at https://github.com/anewgithubname/ Truncated-Score-Matching.
Open Datasets	Yes	We demonstrate the usefulness of our method via numerical experiments and a study on the Chicago crime data set. The code and data sets to reproduce our experiments are available at https://github.com/anewgithubname/ Truncated-Score-Matching. We also experiment on a real-world data set, CIFAR-10, which contains ten different classes of 32 by 32 images.
Dataset Splits	No	The paper mentions generating samples (e.g., "We generate 10,000 samples, only 1417 of which can be used for parameter estimation"), and for CIFAR-10, it states using a "hold-out likelihood". However, specific percentages, absolute counts, or explicit methodologies for splitting data into training, validation, and test sets for reproducibility are not provided in the main text.
Hardware Specification	Yes	Our experiments are run on a workstation with an AMD Ryzen 1700 CPU with 32GB memory.
Software Dependencies	No	We optimize both objective functions using MATLAB s fminunc function with default settings. While MATLAB is mentioned, a specific version number for MATLAB itself is not provided.
Experiment Setup	Yes	Our unnormalized density model is a Gaussian mixture model with four components (parametrized by θ1, . . . , θ4) and the unit variance-covariance matrix: pθ1,...,θ4(x) = P4 i=1 Nx(θi, I). In this experiment, 500,000 particles are used to approximate ZV (θ). We fit a Gaussian mixture model with two components on this data set. The standard deviations of the two components are fixed to the same value, roughly half of the width of the city. The outlier percentage (ν) in OSVM is set to 20%.