Task-Relevant Feature Selection with Prediction Focused Mixture Models

Authors: Abhishek Sharma, Catherine Zeng, Sanjana Narayanan, Sonali Parbhoo, Roy H. Perlis, Finale Doshi-Velez

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analytically characterize representative scenarios where our auto-curation model achieves the desired relevant structure, and support this analysis with empirical evidence using simulations and real datasets. We demonstrate that pf-GMM achieves predictive clusters in even in misspecified-cluster settings on several synthetic and real-world datasets.
Researcher Affiliation Academia Abhishek Sharma EMAIL School of Engineering and Applied Sciences, Harvard University Catherine Zeng EMAIL School of Engineering and Applied Sciences, Harvard University Sanjana Narayanan EMAIL School of Engineering and Applied Sciences, Harvard University Sonali Parbhoo EMAIL Imperial College London Roy Perlis EMAIL Massachusetts General Hospital Finale Doshi-Velez finale@seas.harvard.edu School of Engineering and Applied Sciences, Harvard University
Pseudocode No The paper describes the inference process using Variational Inference (VI) and the variational EM algorithm (Section 5) and Gibbs sampling (Supplement Section B.3), but it does not present these procedures in a structured pseudocode or algorithm block format.
Open Source Code No The paper mentions using external libraries like 'scikit-learn (Pedregosa et al., 2011)', 'Py Torch (Paszke et al., 2019)', and 'pysparcl (tsurumeso, 2024)', but it does not provide any statement about releasing the source code for the methodology described in this paper, nor does it include a link to a code repository.
Open Datasets Yes Swiss Bank Notes The Swiss bank notes dataset (available in the mclust library in R (Scrucca et al., 2016))
Dataset Splits Yes For Swiss Bank Notes Data, we use 3-fold Stratified Cross-validation because the observations are too few to have a separate validation set.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It only describes software and datasets.
Software Dependencies No The paper mentions several software tools and libraries used, such as 'scikit-learn (Pedregosa et al., 2011)', 'Py Torch (Paszke et al., 2019)', 'mclust library in R (Scrucca et al., 2016)', and 'pysparcl (tsurumeso, 2024)'. However, it does not specify the version numbers for these software components, which is required for a reproducible description.
Experiment Setup No The paper mentions that 'We search for the best value of λ and learning rate parameters using a grid search' for pc-GMM (Section E.2), but it does not provide the specific values or ranges of these hyperparameters, nor does it detail other concrete experimental setup settings for their models (e.g., batch size, number of epochs, optimizer specifics).