reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable Deep Generative Recommendation Models

Authors: Huafeng Liu, Liping Jing, Jingxuan Wen, Pengyu Xu, Jiaqi Wang, Jian Yu, Michael K. Ng

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A series of experimental results on four widely-used benchmark datasets demonstrates the superiority of In DGRM on recommendation performance and interpretability. In this section, we evaluate the proposed deep generative model on four datasets by comparing with the state-of-the-art recommendation methods.
Researcher Affiliation	Academia	Huafeng Liu EMAIL School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Department of Mathematics, The University of Hong Kong, Hong Kong SAR, China Liping Jing EMAIL Jingxuan Wen EMAIL Pengyu Xu EMAIL Jiaqi Wang EMAIL Jian Yu EMAIL School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Michael K. Ng EMAIL Department of Mathematics, The University of Hong Kong, Hong Kong SAR, China
Pseudocode	Yes	Algorithm 1 In DGRM Generative Process Algorithm 2 Learning with local variational optimization for In DGRM Algorithm 3 The training procedure for In DGRM with local variational optimization strategy.
Open Source Code	No	No explicit statement or link for the source code of the described methodology is provided in the paper.
Open Datasets	Yes	In experiments, four widely-used recommendation datasets, Movie Lens 20M 8, Netﬂix 9, Ali Shop-7C 10 and Yelp 11, are used to validate the recommendation performance. Footnotes provide the links: 8. https://grouplens.org/datasets/movielens/, 9. https://www.netﬂixprize.com, 10. https://jianxinma.github.io/disentangle-recsys.html, 11. https://www.yelp.com/dataset/challenge.
Dataset Splits	Yes	The held-out users strategy and ﬁve-fold cross validation are used to evaluate the recommendation performance. The 20% users are taken as held-out users and evenly separated for validation and test respectively. For each held-out user, his/her feedback data is randomly split into ﬁve equal sized subsets. Among them, four subsets are used to obtain the latent representation, and the rest subset is for evaluation in each round. Finally, the averaged results on ﬁve rounds are reported.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Adam for training but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Parameter setting: For fair comparison, the number of learnable parameters is set around 2Mdmax for each method, which is equivalent to using dmax-dimensional representations for the M items. The initial dimensionality dmax is set as 150. The dropout technique (Srivastava et al., 2014) is adopted at the input layer with probability 0.5. The model is trained using Adam (Kingma and Ba, 2014) with batch size of 128 users for 200 epoch on all datasets. The regularization coeﬃcient λ is set to 1.2 for ML 20M and Netﬂix, 1.5 for Ali Shop-7C and Yelp. λo is set to 1 for better disentanglement. For auto-encoder based deep methods (Mult-VAE, Macri VAE, DGLGM and our method), the hyperparameters are automatically tuned via TPE (Bergstra et al., 2011), which searches the optimal hyperparameter conﬁguration with 200 trials on the validation set.