reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Particle-Based Variational Approach to Bayesian Non-negative Matrix Factorization

Authors: Muhammad A Masood, Finale Doshi-Velez

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On several real datasets, we obtain better particle approximations to the BNMF posterior in less time than baselines and demonstrate the signiﬁcant role that multimodality plays in NMF-related tasks. Through our experiments, we demonstrate that: On a large number of real-world datasets, our particle-based posterior approximations consistently outperform baselines in terms of both posterior quality and computational running time.
Researcher Affiliation	Academia	Muhammad A Masood EMAIL Harvard John A. Paulson School of Engineering and Applied Science Cambridge, MA 02138, USA Finale Doshi-Velez EMAIL Harvard John A. Paulson School of Engineering and Applied Science Cambridge, MA 02138, USA
Pseudocode	Yes	Algorithm 1 Particle-based Variational Inference for BNMF using Q-Transform Input: Data {X}, Rank {RNMF}, # Factorizations M Step 1: Perform M repetitions of Algorithm 2 to get matrices {Qm A , Qm W }M m=1 or re-use them if previously constructed Step 2: Apply Q-Transform (Algorithm 3) to get Initializations {Am 0 , W m 0 }M m=1 Step 3: Apply NMF algorithm to get Factorizations {Am, W m}M m=1 Step 4: Apply Algorithm 5 using a given BNMF model to get weights {wm}M m=1 for approximate posterior Output: Discrete NMF Posterior {wm, Am, W m}M m=1
Open Source Code	Yes	Code and demonstrations at https://github.com/dtak/Q-Transfer-Demo-public-/
Open Datasets	Yes	Our datasets cover a range of diﬀerent types and can be divided into three main categories (count data, grayscale face images and hyperspectral images). The Autism dataset is of interest to the medical community for understanding disease subtypes in the Autism spectrum and is not publicly available. The remaining datasets are public and are considered standard benchmark datasets for NMF. Table 1 provides a description of each dataset as well as the rank used and a citation. The Autism dataset is of interest to the medical community for understanding disease subtypes in the Autism spectrum and is not publicly available. The remaining datasets are public and are considered standard benchmark datasets for NMF.
Dataset Splits	Yes	In our experiments, we hold out ten percent of the observations and report performance on both provided and held-out observations.
Hardware Specification	No	No specific hardware details like GPU/CPU models or cloud resources are mentioned in the paper, only general statements about memory requirements for certain algorithms.
Software Dependencies	No	The paper mentions software like 'scikit-learn', 'CVXPY' (with SCS), and 'autograd' but does not provide specific version numbers for these software dependencies as used in the experimental setup. It refers to 'default settings of scikit-learn (Pedregosa et al., 2011)' and 'Splitting Conic Solver (SCS) in the convex optimization package CVXPY (Diamond and Boyd, 2016)'.
Experiment Setup	Yes	Model: exponential-Gaussian model parameters: We set the standard deviation σX to be equal to the empirical standard deviation of a reference NMF. The exponential parameter was set to one for each entry in the basis and weights matrices (λd,r = λr,n = 1). Model: SILF model parameters: ...To set the threshold parameter ϵ for each dataset, we use an empirical approach where we ﬁnd a collection of 50 high-quality factorizations under default settings of scikit-learn (Pedregosa et al., 2011). The objective function is evaluated for each of them {fi}50 i=1 and ϵ = 1.2 maxi fi. We set the remaining SILF likelihood sensitivity parameters β = 0.1, C = 2. For the prior, we identically set the exponential parameter for each entry: λr,n = 1. Inference: Generating Q-transform matrices for transfer: ...transfer rank and SVD rank RT = RSVD = 3. We generated twenty sets of synthetic data Xs R12 12 + using non-negative matrices of rank RT with truncated Gaussian noise. For each synthetic dataset, we ﬁnd ﬁve pairs of transformation matrices through random restarts. In all our experiments, the same set of Mmax = 100 pairs of transformation matrices {Qm A , Qm W }100 m=1 are applied to each of the real datasets.