Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Fast Slate Policy Optimization: Going Beyond Plackett-Luce
Authors: Otmane Sakhi, David Rohde, Nicolas Chopin
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For our experiments, we focus on learning slate decision functions for the particular case of recommendation as collaborative filtering datasets are easily accessible, facilitating the reproducibility of our results. We choose three collaborative filtering datasets with varying action space size, Movie Lens25M (Harper & Konstan, 2015), Twitch (Rappaz et al., 2021) and Good Reads (Wan & Mc Auley, 2018; Wan et al., 2019). |
| Researcher Affiliation | Collaboration | Otmane Sakhi CREST-ENSAE, Criteo AI Lab EMAIL David Rohde Criteo AI Lab EMAIL Nicolas Chopin CREST-ENSAE EMAIL |
| Pseudocode | Yes | Algorithm 1: Learning with Latent Gaussian Perturbation |
| Open Source Code | No | The paper does not contain any explicit statements about the release of source code for the methodology, nor does it provide links to a code repository. |
| Open Datasets | Yes | We choose three collaborative filtering datasets with varying action space size, Movie Lens25M (Harper & Konstan, 2015), Twitch (Rappaz et al., 2021) and Good Reads (Wan & Mc Auley, 2018; Wan et al., 2019). |
| Dataset Splits | Yes | Given a dataset, we split randomly the user-item interaction session [X, Y ] into two parts; the observed interactions X and the hidden interactions Y. ... We split each dataset by users and keep 10% to create a validation set, on which the reward of the decision function is reported. |
| Hardware Specification | No | The training is conducted on a CPU machine, using the Adam optimizer (Kingma & Ba, 2014), with a batch size of 32. This description of 'a CPU machine' is too general and lacks specific details such as model, clock speed, or memory. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'FAISS library' but does not provide specific version numbers for these or any other software components used in the experiments. |
| Experiment Setup | Yes | The training is conducted on a CPU machine, using the Adam optimizer (Kingma & Ba, 2014), with a batch size of 32. We tune the learning rate on a validation set for all algorithms. ... We fix the latent space dimension L = 100 and use a slate size of K = 5 for these experiments. ... For all datasets and training routines, we allow a runtime of 60 minutes, and evaluate our policies on the validation set for 10 equally spaced intervals. ... Even if σ can be treated as an additional parameter, we fix it to σ = 1/L in all our experiments for a fair comparison. |