RIFLE: Imputation and Robust Inference from Low Order Marginals

Authors: Sina Baharlouei, Sze-Chuan Suen, Meisam Razaviyayn

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In numerical experiments, we compare RIFLE to several state-of-the-art approaches (including MICE, Amelia, Miss Forest, KNN-imputer, MIDA, and Mean Imputer) for imputation and inference in the presence of missing values. Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small.
Researcher Affiliation Academia Sina Baharlouei EMAIL University of Southern California Kelechi Ogudu EMAIL University of Southern California Sze-chuan Suen EMAIL University of Southern California Meisam Razaviyayn EMAIL University of Southern California
Pseudocode Yes Algorithm 1 RIFLE for Ridge Linear Regression in the Presence of Missing Values Algorithm 2 Finding the optimal λ and pi s using the bisection idea Algorithm 3 Robust Quadratic Discriminant Analysis in the Presence of Missing Values
Open Source Code Yes RIFLE is publicly available at https://github.com/optimization-for-data-driven-science/RIFLE.
Open Datasets Yes We run RIFLE and several state-of-the-art approaches on five datasets from the UCI repository (Dua & Graff, 2017) (Spam, Housing, Clouds, Breast Cancer, and Parkinson datasets) with different proportions of MCAR missing values (the description of the datasets can be found in Appendix I).
Dataset Splits Yes Furthermore, λ, the hyper-parameter for the ridge regression regularizer, is tuned by choosing 20% of the data as the validation set from the set {0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 50}. ...Both Blog Feedback and Superconductivity datasets contain 30% of MNAR missing values in the training phase generated by Algorithm 9, with 10000 and 20000 training samples, respectively.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It only mentions grants in the acknowledgments, which are funding sources, not hardware specifications.
Software Dependencies No The paper refers to several existing imputation packages (MICE, Amelia, Miss Forest, KNN-Imputer, MIDA, GAIN) and their original publications, but it does not specify the version numbers of these tools as used in their experiments, nor does it list specific versioned software dependencies for the implementation of RIFLE itself (e.g., programming languages or libraries with version numbers).
Experiment Setup Yes The hyper-parameter c in (7) controls the robustness of the model by adjusting the size of confidence intervals. This parameter is tuned by performing a cross-validation procedure over the set {0.1, 0.25, 0.5, 1, 2, 5, 10, 20, 50, 100}, and the one with the lowest NMRSE is chosen. The default value in the implementation is c = 1 since it consistently performs well over different experiments. Furthermore, λ, the hyper-parameter for the ridge regression regularizer, is tuned by choosing 20% of the data as the validation set from the set {0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 50}. To tune K, the number of bootstrap samples for estimating the confidence intervals, we tried 10, 20, 50, and 100. No significant difference is observed in terms of the test performance for the above values. ... We perform RIFLE for 1000 iterations and the step size of 0.01 in the above experiments.