Prior Specification for Exposure-based Bayesian Matrix Factorization

Authors: Zicong Zhu, Issei Sato

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we present an enhanced method for specifying priors in Bayesian matrix factorization models. We improve the estimators by implementing an exposure-based model to better simulate data scarcity. Our method demonstrates significant accuracy improvements in hyperparameter estimation during synthetic experiments. We also explore the feasibility of applying this method to real-world datasets and provide insights into how the model s behavior adapts to varying levels of data sparsity. [...] We conducted experiments on synthetic datasets, demonstrating that our new estimators outperform existing methods, especially as the dataset becomes sparser.
Researcher Affiliation Academia Zicong Zhu EMAIL Department of Computer Science The University of Tokyo Issei Sato EMAIL Department of Computer Science The University of Tokyo
Pseudocode No The paper describes the model definitions and derivations mathematically and textually, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We conducted additional experiments on real-world datasets Movie Lens (Harper & Konstan, 2015), which has been widely studied for recommender systems.
Dataset Splits No We first generate the synthetic data with the following 3 steps repeatedly: (1) We sample the matrix P and Q with the prior hyperparameters for particular specifications; (2) We recover the fully dense matrix R by the product of P and Q; (3) We sample the Bernoulli variables Oij with different sparsity levels and multiply them with each entry of the dense matrix R to obtain the sparse observation matrix Y . [...] We selected three Movie Lens datasets with different sizes, from 100k records to 10m records. The datasets contain users ratings of different movies on a 5-star scale, with half-star increments (0.5 stars 5.0 stars). While the paper describes the generation of synthetic data and the characteristics of the MovieLens datasets, it does not specify explicit training/test/validation splits for its experiments or how the MovieLens data was partitioned for the evaluation of the estimators.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes We conduct the experiments with specifications A, D, and F because they are distinct from each other. The full specification setup defined by da Silva et al. (2023) is described in Table 4. In specification A, matrices P and Q share the same prior parameters, but their shape parameters are 10 times larger than their rate parameters. [...] Table 1: Hyperparameters Initialization for Different Specifications. Spec. a b c d µp σp µq σq E[R] V[R] A 10 1 10 1 10.0 3.16 10.0 3.16 2500.00 55000.00 D 0.1 1 0.1 1 0.1 0.32 0.1 0.32 0.25 0.55 F 1 1 0.1 0.1 1.0 1.0 1.0 3.16 25.00 550.00 [...] Table 2: Variables of Experiment Setups Prior Spec. K (Num. of Latent Factors) Pobs. (Parameter of Bernoulli distribution) [A, D, F] [25, 50, 75, 100, 125, 150] Group 1: [1.0, 0.98, 0.96, 0.94, 0.92, 0.90] Group 2: [0.5, 0.4, 0.3, 0.2, 0.1] Group 3: [0.05, 0.04, 0.03, 0.02, 0.01]