reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Task-Relevant Feature Selection with Prediction Focused Mixture Models

Authors: Abhishek Sharma, Catherine Zeng, Sanjana Narayanan, Sonali Parbhoo, Roy H. Perlis, Finale Doshi-Velez

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analytically characterize representative scenarios where our auto-curation model achieves the desired relevant structure, and support this analysis with empirical evidence using simulations and real datasets. We demonstrate that pf-GMM achieves predictive clusters in even in misspeciﬁed-cluster settings on several synthetic and real-world datasets.
Researcher Affiliation	Academia	Abhishek Sharma EMAIL School of Engineering and Applied Sciences, Harvard University Catherine Zeng EMAIL School of Engineering and Applied Sciences, Harvard University Sanjana Narayanan EMAIL School of Engineering and Applied Sciences, Harvard University Sonali Parbhoo EMAIL Imperial College London Roy Perlis EMAIL Massachusetts General Hospital Finale Doshi-Velez ﬁnale@seas.harvard.edu School of Engineering and Applied Sciences, Harvard University
Pseudocode	No	The paper describes the inference process using Variational Inference (VI) and the variational EM algorithm (Section 5) and Gibbs sampling (Supplement Section B.3), but it does not present these procedures in a structured pseudocode or algorithm block format.
Open Source Code	No	The paper mentions using external libraries like 'scikit-learn (Pedregosa et al., 2011)', 'Py Torch (Paszke et al., 2019)', and 'pysparcl (tsurumeso, 2024)', but it does not provide any statement about releasing the source code for the methodology described in this paper, nor does it include a link to a code repository.
Open Datasets	Yes	Swiss Bank Notes The Swiss bank notes dataset (available in the mclust library in R (Scrucca et al., 2016))
Dataset Splits	Yes	For Swiss Bank Notes Data, we use 3-fold Stratiﬁed Cross-validation because the observations are too few to have a separate validation set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It only describes software and datasets.
Software Dependencies	No	The paper mentions several software tools and libraries used, such as 'scikit-learn (Pedregosa et al., 2011)', 'Py Torch (Paszke et al., 2019)', 'mclust library in R (Scrucca et al., 2016)', and 'pysparcl (tsurumeso, 2024)'. However, it does not specify the version numbers for these software components, which is required for a reproducible description.
Experiment Setup	No	The paper mentions that 'We search for the best value of λ and learning rate parameters using a grid search' for pc-GMM (Section E.2), but it does not provide the specific values or ranges of these hyperparameters, nor does it detail other concrete experimental setup settings for their models (e.g., batch size, number of epochs, optimizer specifics).