reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Information criteria for non-normalized models

Authors: Takeru Matsuda, Masatoshi Uehara, Aapo Hyvarinen

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results and applications to real data demonstrate that the proposed criteria enable selection of the appropriate non-normalized model in a data-driven manner. Keywords: energy-based model, model selection, noise contrastive estimation, score matching
Researcher Affiliation	Academia	Takeru Matsuda EMAIL RIKEN Center for Brain Science; Masatoshi Uehara EMAIL Department of Computer Science, Cornell University; Aapo Hyv arinen EMAIL Department of Computer Science, University of Helsinki
Pseudocode	No	The paper describes algorithms like NCE and Score Matching in prose, but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using third-party code like "We used the MATLAB program from https://github.com/bodono/apg." and "We used R package glasso from http://statweb.stanford.edu/~tibs/glasso/.", but it does not provide any explicit statement or link for the authors' own implementation code.
Open Datasets	Yes	We used N = 5 104 image patches of 8 8 pixels taken from natural images. This data is provided in Hoyer s imageica package.6 http://www.cs.helsinki.fi/patrik.hoyer/; We apply SMIC to comparison of graphical model for the RNAseq data used in Lin et al. (2016).; Figure 6 shows a 2-d histogram of wind direction at Tokyo on 00:00 (x1) and 12:00 (x2) for N = 365 days in 2018, which was obtained from the website of Japan Meteorological Agency.
Dataset Splits	No	The paper describes generating 'N' samples for simulations and using entire datasets for real-world applications. However, it does not explicitly provide information on training, validation, or test dataset splits in terms of percentages, sample counts, or specific methodologies for partitioning data to reproduce experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only discusses software and methodologies.
Software Dependencies	Yes	For MLE, we used CVX, a MATLAB package for convex programming (Grant and Boyd, 2018).; For optimization, we employed the accelerated proximal gradient algorithm4. We used the MATLAB program from https://github.com/bodono/apg.; We used R package glasso from http://statweb.stanford.edu/~tibs/glasso/.
Experiment Setup	Yes	For numerical optimization in NCE and score matching, we use the nonlinear conjugate gradient method (Rasmussen, 2006).; For NCE, we generated M = N noise samples y(1), . . . , y(M) from the normal distribution with the same mean and covariance with x(1), . . . , x(N).; For NCE, we generated M = N noise samples y(1), . . . , y(M) from the product of the coordinate-wise exponential distributions with the same mean as x(1), . . . , x(N).; For edge selection, we employed l1 regularized score matching (Lin et al., 2016) for truncated Gaussian graphical models (32) and graphical LASSO7 for log-Gaussian graphical models (34), respectively. Namely, we computed the whole regularization paths. After edge selection, we ﬁtted the graphical models again by score matching without regularization to calculate SMIC.