Task-Relevant Feature Selection with Prediction Focused Mixture Models
Authors: Abhishek Sharma, Catherine Zeng, Sanjana Narayanan, Sonali Parbhoo, Roy H. Perlis, Finale Doshi-Velez
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analytically characterize representative scenarios where our auto-curation model achieves the desired relevant structure, and support this analysis with empirical evidence using simulations and real datasets. We demonstrate that pf-GMM achieves predictive clusters in even in misspecified-cluster settings on several synthetic and real-world datasets. |
| Researcher Affiliation | Academia | Abhishek Sharma EMAIL School of Engineering and Applied Sciences, Harvard University Catherine Zeng EMAIL School of Engineering and Applied Sciences, Harvard University Sanjana Narayanan EMAIL School of Engineering and Applied Sciences, Harvard University Sonali Parbhoo EMAIL Imperial College London Roy Perlis EMAIL Massachusetts General Hospital Finale Doshi-Velez finale@seas.harvard.edu School of Engineering and Applied Sciences, Harvard University |
| Pseudocode | No | The paper describes the inference process using Variational Inference (VI) and the variational EM algorithm (Section 5) and Gibbs sampling (Supplement Section B.3), but it does not present these procedures in a structured pseudocode or algorithm block format. |
| Open Source Code | No | The paper mentions using external libraries like 'scikit-learn (Pedregosa et al., 2011)', 'Py Torch (Paszke et al., 2019)', and 'pysparcl (tsurumeso, 2024)', but it does not provide any statement about releasing the source code for the methodology described in this paper, nor does it include a link to a code repository. |
| Open Datasets | Yes | Swiss Bank Notes The Swiss bank notes dataset (available in the mclust library in R (Scrucca et al., 2016)) |
| Dataset Splits | Yes | For Swiss Bank Notes Data, we use 3-fold Stratified Cross-validation because the observations are too few to have a separate validation set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It only describes software and datasets. |
| Software Dependencies | No | The paper mentions several software tools and libraries used, such as 'scikit-learn (Pedregosa et al., 2011)', 'Py Torch (Paszke et al., 2019)', 'mclust library in R (Scrucca et al., 2016)', and 'pysparcl (tsurumeso, 2024)'. However, it does not specify the version numbers for these software components, which is required for a reproducible description. |
| Experiment Setup | No | The paper mentions that 'We search for the best value of λ and learning rate parameters using a grid search' for pc-GMM (Section E.2), but it does not provide the specific values or ranges of these hyperparameters, nor does it detail other concrete experimental setup settings for their models (e.g., batch size, number of epochs, optimizer specifics). |