reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prior Specification for Exposure-based Bayesian Matrix Factorization

Authors: Zicong Zhu, Issei Sato

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we present an enhanced method for specifying priors in Bayesian matrix factorization models. We improve the estimators by implementing an exposure-based model to better simulate data scarcity. Our method demonstrates significant accuracy improvements in hyperparameter estimation during synthetic experiments. We also explore the feasibility of applying this method to real-world datasets and provide insights into how the model s behavior adapts to varying levels of data sparsity. [...] We conducted experiments on synthetic datasets, demonstrating that our new estimators outperform existing methods, especially as the dataset becomes sparser.
Researcher Affiliation	Academia	Zicong Zhu EMAIL Department of Computer Science The University of Tokyo Issei Sato EMAIL Department of Computer Science The University of Tokyo
Pseudocode	No	The paper describes the model definitions and derivations mathematically and textually, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We conducted additional experiments on real-world datasets Movie Lens (Harper & Konstan, 2015), which has been widely studied for recommender systems.
Dataset Splits	No	We first generate the synthetic data with the following 3 steps repeatedly: (1) We sample the matrix P and Q with the prior hyperparameters for particular specifications; (2) We recover the fully dense matrix R by the product of P and Q; (3) We sample the Bernoulli variables Oij with different sparsity levels and multiply them with each entry of the dense matrix R to obtain the sparse observation matrix Y . [...] We selected three Movie Lens datasets with different sizes, from 100k records to 10m records. The datasets contain users ratings of different movies on a 5-star scale, with half-star increments (0.5 stars 5.0 stars). While the paper describes the generation of synthetic data and the characteristics of the MovieLens datasets, it does not specify explicit training/test/validation splits for its experiments or how the MovieLens data was partitioned for the evaluation of the estimators.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We conduct the experiments with specifications A, D, and F because they are distinct from each other. The full specification setup defined by da Silva et al. (2023) is described in Table 4. In specification A, matrices P and Q share the same prior parameters, but their shape parameters are 10 times larger than their rate parameters. [...] Table 1: Hyperparameters Initialization for Different Specifications. Spec. a b c d µp σp µq σq E[R] V[R] A 10 1 10 1 10.0 3.16 10.0 3.16 2500.00 55000.00 D 0.1 1 0.1 1 0.1 0.32 0.1 0.32 0.25 0.55 F 1 1 0.1 0.1 1.0 1.0 1.0 3.16 25.00 550.00 [...] Table 2: Variables of Experiment Setups Prior Spec. K (Num. of Latent Factors) Pobs. (Parameter of Bernoulli distribution) [A, D, F] [25, 50, 75, 100, 125, 150] Group 1: [1.0, 0.98, 0.96, 0.94, 0.92, 0.90] Group 2: [0.5, 0.4, 0.3, 0.2, 0.1] Group 3: [0.05, 0.04, 0.03, 0.02, 0.01]