reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling

Authors: Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, Théo Bontempelli, Thomas Bouabça, Tristan Cazenave

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on simulated data, real-world public data, and the successful large-scale deployment of v MF-exp on the recommender system of a global music streaming service empirically validate the key properties of the proposed method.
Researcher Affiliation	Collaboration	1Deezer Research, Paris, France. 2LAMSADE, Université Paris Dauphine, PSL, Paris, France. 3SPEIT, Shanghai Jiao Tong University, Shanghai, China.
Pseudocode	Yes	Algorithm 1 Sample VO 1 Sample vector U uniformly from Sd 1; 2 Compute projection of U on V : W = U, V V ; 3 Subtract projection and normalize: VO = U W \|\|U W \|\|; 4 return VO
Open Source Code	Yes	We publicly release a Python implementation of v MF-exp on Git Hub to enable reproducibility of our experiments and to encourage future use of the method: https://github.com/deezer/v MF-exploration.
Open Datasets	Yes	Therefore, in Appendix H, we empirically validate the main properties of v MF-exp using a large-scale, publicly available dataset of one million Glo Ve word embedding vectors (Pennington et al., 2014). The Glo Ve-25 dataset is available for download at: https://nlp.stanford.edu/projects/glove/.
Dataset Splits	No	The paper describes using simulated data and real-world datasets (GloVe-25, Deezer's music catalog) for empirical validation and Monte Carlo simulations. It specifies parameters for these simulations and experiments (e.g., number of actions, inner product values, kappa), but does not detail traditional machine learning dataset splits (e.g., train/test/validation percentages or specific sample counts) for model training or evaluation in the conventional sense.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cluster configurations used for running the experiments or simulations. It mentions an 'industrial deployment' which implies hardware usage but lacks specifics.
Software Dependencies	No	The paper mentions using a 'Python implementation' and specific libraries like 'Python v MF sampler from Pinz on & Jung (2023)' and the 'Faiss library (Johnson et al., 2019)' but does not provide version numbers for Python itself or these libraries.
Experiment Setup	Yes	Figure 2 reports, for κ = 1.0, <V,A>=0.5 and growing values of d, the Pv MF-exp(a) sampling probability depending on the number of actions n, as well as PB-exp(a) with similar parameters and our approximations P0(a) and P1(a). In this section, we compare the behaviors of B-exp and v MF-exp on the Glo Ve-25 dataset of 1 million Glo Ve word embedding vectors with dimension d = 25... for varying action numbers n and inner products <V,A>...Sampling is repeated 30 million times and averaged to obtain precise estimates. To generate playlists, Deezer leverages a collaborative filtering model... This model learns unit norm song embedding representations of dimension d = 128... tuning κ (see Equation (4) of Sra (2012))... We first retrieve the m = 500 nearest neighbors of the initial song in the embedding space...