reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Mutual Information Based Feature Selection by Boosting Unique Relevance

Authors: Shiyu Liu, Mehul Motani

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of MRw MR-BUR-KSG and MRw MR-BUR-CLF is validated via experiments using six public datasets and four popular classifiers. Specifically, as compared to MRw MR, the proposed MRw MR-BUR-KSG improves the test accuracy by 2% 3% with 25% 30% fewer features being selected, without increasing the algorithm complexity.
Researcher Affiliation	Academia	Shiyu Liu EMAIL Department of Electrical and Computer Engineering, College of Design and Engineering, National University of Singapore, Singapore Mehul Motani EMAIL Department of Electrical and Computer Engineering, College of Design and Engineering, Institute of Data Science, N.1 Institute for Health, Institute for Digital Medicine, National University of Singapore, Singapore
Pseudocode	Yes	Algorithm 1 MI based Feature Selection via JABC Input: Scoring function JABC( ); Ω Set of M features; Y Label; Output: the set S with the selected features. 1: Initialization: S { }; K M. 3: Choose the feature F that 4: F = argmax X Ω\S JABC(X); 6: until \|S\| = K:
Open Source Code	Yes	Source Code. We have released the source code, which can be found at https://github.com/kentridgeai/MRw MR-BUR.
Open Datasets	Yes	To examine the performance of the proposed MRw MR-BUR criterion, we conduct experiments using six public datasets (Alon et al., 1999; Golub et al., 1999; Dua & Graff, 2017; Guyon et al., 2003; Alexander et al., 2012) (see descriptions in Table 3) and compare the performance of MRw MR-BUR-KSG (estimate UR via the KSG estimator) to MRw MR via four popular classifiers: Support Vector Machine (SVM) (Cortes et al., 1995), K-Nearest Neighbors (KNN) (Larose & Larose, 2014), Random Forest (RF) (Breiman, 2001) and Multilayer Perceptron (MLP) (Haykin, 1994).
Dataset Splits	Yes	For each run, the dataset is randomly split into three subsets: training dataset (60%), validation dataset (20%), testing dataset (20%).
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or memory specifications) were provided in the paper for running the experiments.
Software Dependencies	No	The paper mentions classifiers like SVM, KNN, RF, and MLP, but does not provide specific software library names with version numbers (e.g., scikit-learn version X.Y.Z, PyTorch version A.B.C) for their implementation.
Experiment Setup	Yes	Parameter Tuning. To ensure fair comparison, the parameters of all classifiers are tuned using the validation dataset via grid search and all algorithms share the same grid searching range and step size. Some key parameters are tuned as follows. (i) the number of nearest neighbors K for KNN is tuned from 3 to 50 with step size of 2. (ii) the regularization coefficient c for SVM is chosen from {0.001, 0.01, 0.1, 1, 10}. (iii) the number of decision trees in the RF is chosen from {10, 15, ..., 100}.