reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DPPy: DPP Sampling with Python

Authors: Guillaume Gautier, Guillermo Polito, Rémi Bardenet, Michal Valko

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	DPPy, a turnkey Python implementation of known general algorithms to sample ﬁnite DPPs. We also include algorithms for non-stationary continuous DPPs, e.g., related to random covariance matrices or Monte Carlo methods, which are also of interest for MLers. The DPPy project, hosted on Git Hub,m is already being used by the cross-disciplinary DPP community (Burt et al., 2019; Kammoun, 2018; Poulson, 2019; Derezi nski et al., 2019; Gautier et al., 2019). We use Travis for continuous integration and Coveralls for test coverage. Through Read The Docsî we provide an extensive documentation, which covers the essential mathematical background and showcases the key properties of DPPs through DPPy objects and associated methods. DPPy thus also serves as a tutorial on DPPs. ... Samples can be displayed via .plot() or .hist() to construct the empirical distribution that converges to the Marˇcenko-Pastur distribution, see Figure 1(b).
Researcher Affiliation	Collaboration	Guillaume Gautier EMAIL Guillermo Polito EMAIL R emi Bardenet EMAIL Michal Valko EMAIL Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRISt AL, 59651 Villeneuve d Ascq, France Inria Lille-Nord Europe, 40 avenue Halley 59650 Villeneuve d Ascq, France Deep Mind Paris, 14 Rue de Londres, 75009 Paris, France
Pseudocode	No	The paper describes algorithms verbally (e.g., "Hough et al. (2006, Algorithm 18) provide a generic projection DPP sampler that we briefly describe") but does not include any formatted pseudocode or algorithm blocks.
Open Source Code	Yes	The project is hosted on Git Hubm and equipped with an extensive documentation.î ... The DPPy project, hosted on Git Hub,m is already being used by the cross-disciplinary DPP community... m github.com/guilgautier/DPPy
Open Datasets	No	The paper focuses on presenting a software toolbox for sampling determinantal point processes. It discusses generating samples from various ensembles (e.g., β-ensemble, Laguerre ensemble) and visualizing their empirical distributions. It does not use or provide access to external, pre-existing datasets for evaluation or training. Therefore, no information about publicly available datasets is provided in the context of typical machine learning benchmarks.
Dataset Splits	No	The paper introduces a software toolbox for sampling and visualizing statistical distributions. It does not involve the use of external datasets with defined training, validation, or test splits. The context of dataset splits is not applicable to the content of this paper.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the described software or experiments. There is no mention of GPU models, CPU types, or other computing resources.
Software Dependencies	No	The paper mentions "DPPy, a Python toolbox" and that it uses "Travis for continuous integration and Coveralls for test coverage". However, it does not specify any version numbers for Python or any other software libraries or dependencies. Only general software names are provided without version numbers.
Experiment Setup	No	The paper describes a software toolbox and its functionalities, including how to instantiate objects and call sampling methods. It mentions parameters like `beta=2` for specific ensembles, which are part of the mathematical definition of the processes being sampled, not hyperparameters in a typical machine learning experimental setup (e.g., learning rates, batch sizes, epochs). There are no details provided about model initialization, training schedules, or other specific experimental configurations usually found in research involving model training or complex evaluations.