reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Score Matching with Missing Data

Authors: Josh Givens, Song Liu, Henry Reeve

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we go through simulated results comparing our IW approach (Marg-IW) in Algorithm 1 and our variational approach (Marg-Var) in Algorithm 2 to the EM approach of Uehara et al. (2020). We also compare to a naive marginalisation approach involving zeroing out the missing dimensions and only taking the observed output dimensions of the score, which we call Zeroed Score Matching. ... In our experiments, we highlight a unique strength of our methods by applying them to explicitly parameterised score models. ... More implementation details can be found in Appendix E.3. ... 5.1. Parameter Estimation 5.1.1. TRUNCATED GAUSSIAN MODEL In this experiment a 10-dim normal distribution is set up with fixed mean and random covariance before being truncated on the first 3 dimensions. 1000 samples are taken and corrupted independently on each coordinate with probability 0.2. ... 5.2. Gaussian Graphical Model Estimation ... We apply our methods to learn GGMs and truncated GGMs with missing data as well. ... The AUC was then calculated for each method taking the GGM from fully observed score matching as the ground truth.
Researcher Affiliation	Academia	1School of Mathematics, University of Bristol, Bristol, UK 2School of Artificial Intelligence, Nanjing University, China. Correspondence to: Josh Givens <EMAIL>.
Pseudocode	Yes	Algorithm 1 Marginal IW Score Matching ... Algorithm 2 Marginal Variational Score Matching
Open Source Code	Yes	All code and data for the experiments presented can also be found at https://github.com/joshgivens/ Score Matchingwith Missing Data
Open Datasets	Yes	The S&P 100 was taken from the S&P 500 data between 2013 and 2018 given in https://www.kaggle.com/ datasets/camnugent/sandp500 with the 100 stocks that made up the S&P 100 taken from roughly the mid-point of the time period which we obtained from https://en.wikipedia.org/w/index.php?title=S%26P_100& oldid=666413597. The yeast data was obtained from https://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1990/ matrix/ which was accessed via https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1990 and other subsets have previously been studied in the context of GGM estimation in Yang & Lozano (2015).
Dataset Splits	No	The paper describes generating and corrupting data, along with specific parameters for these processes (e.g., '1000 samples are taken and corrupted independently on each coordinate with probability 0.2'). For Gaussian Graphical Models, it describes the range of L1 regularization applied to implicitly define graph structures and how AUC is calculated by comparing against a 'ground truth' GGM. For real-world data, it mentions producing '25 random samples of the corruption'. However, it does not provide explicit training/test/validation split percentages or sample counts for model evaluation in a conventional machine learning sense, nor does it refer to standard, predefined splits for commonly used datasets.
Hardware Specification	No	The paper does not explicitly mention any specific hardware components such as GPU or CPU models, memory specifications, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper mentions 'Adam used as the optimisation algorithm' but does not specify its version. It also discusses implementation in terms of 'proximal stochastic gradient descent' and parameterization via 'Cholesky decomposition' without listing specific software libraries or their version numbers that would be necessary for reproduction.
Experiment Setup	Yes	In each case batches of 100 samples were taken and a learning rate of 0.01 was used with Adam used as the optimisation algorithm. Our score model was parameterised in terms of the Cholesky decomposition of the precision matrix in order to ensure the Precision estimate stayed positive definite. For our Importance weighting and the EM approach of Uehara et al. (2020), an isotropic Gaussian with mean 0 and coordinatewise variance of 16 was used. ... We apply our methods to learn GGMs and truncated GGMs with missing data as well. We use varying levels of L1 regularisation on our objective via proximal stochastic gradient descent in our optimisation (Beck, 2017). ... We took L1 regularisation to ensure that at the highest level the graph had no edges and at the lowest level the graph had all possible edges. For this experiment, this was achieved with γ (10 1.7, 10 4). Throughout we took the threshold for edge presence to be 0.002. ... In practice we find taking 10 gradient steps of ϕ for each gradient step of θ to work well.