reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Feature Shift Localization Network

Authors: Mı́riam Barrabés, Daniel Mas Montserrat, Kapal Dev, Alexander G. Ioannidis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation setup is consistent with (Barrab es et al., 2023), using the same reference and query sets and optimized benchmarking methods. Hyperparameter tuning for FSL-Net is detailed in Section E. We compare FSL-Net against five feature shift localization methods (Data Fix, MB-SM, MB-KS, KNN-KS, and Deep-SM) and four feature selection methods (MI, Select KBest, MRMR, and Fast-CMIM). ... Performance is evaluated using the F-1 score for feature shift localization accuracy and wall-clock runtime for computational efficiency. ... Ablation Analysis. We assess the impact of each component of FSL-Net s Statistical Descriptor Network by training models with different combinations of its three components...
Researcher Affiliation	Academia	1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305 USA 2Department of Computer Science, Munster Technological University, Cork T12 P928, Ireland 3Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95060 USA. Correspondence to: Alexander G. Ioannidis <EMAIL>.
Pseudocode	No	The paper describes the FSL-Net architecture and its components (Statistical Descriptor Network, Prediction Network) using prose and diagrams (Figure 1), along with mathematical formulations for loss functions and statistical measures. However, it does not include a dedicated section or figure presenting pseudocode or a formal algorithm block.
Open Source Code	Yes	The code and readyto-use trained model are available at https: //github.com/AI-sandbox/FSL-Net.
Open Datasets	Yes	We source a total of 1,032 diverse tabular datasets from Open ML (Van Rijn et al., 2013)... The continuous datasets are sourced from the UCI repository (Gas (Huerta et al., 2016), Energy (Candanedo et al., 2017), and Musk2 (Blake, 1998)) and Open ML (Scene (Boutell et al., 2004), MNIST (Deng, 2012), and Dilbert (Vanschoren et al., 2014)). Additionally, a Covid-19 dataset (Force, 2022)... The categorical datasets consist of high-dimensional biomedical data, including the Phenotypes dataset (Qian et al., 2020), a subset of categorical traits from the UK Biobank, the Founders dataset containing binary-coded human DNA sequences (Perera et al., 2022), and the Canine dataset comprising binary-coded dog DNA sequences (Barrab es et al., 2023).
Dataset Splits	Yes	In total, 1,350 datasets are used for training, with 50 reserved for validation. ... Each subset is then split equally into reference and query samples. ... The samples are evenly divided into two subsets, forming the reference and query sets.
Hardware Specification	Yes	All evaluations were conducted on an Intel Xeon Gold with 12 CPU cores. ... To expedite the training process, a single NVIDIA-SMI GPU with 32GB of memory was used.
Software Dependencies	No	The paper describes the implementation of FSL-Net as a neural network and references common machine learning models like random forests and k-nearest neighbors. However, it does not explicitly state any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 7 provides an overview of the search spaces and the optimal values determined for each network-related hyperparameter in FSL-Net. These parameters encompass configurations for the statistical measures, Moment Extraction Network, Neural Embedding Network, and Prediction Network. ... Table 9 outlines the search space and optimal values for optimization hyperparameters in FSL-Net s training strategy. This includes the loss function and Adam optimizer settings.