reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SAND: One-Shot Feature Selection with Additive Noise Distortion

Authors: Pedram Pad, Hadi Hammoud, Mohamad Dia, Nadim Maamari, Liza Andrea Dunbar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive benchmarking study against state-of-the-art feature selection methods using common datasets and a novel real-world dataset, showcasing our method s effective competition against existing approaches. . . . Test metrics on 9 datasets over 10 trials. The metric is accuracy ( ) for all except MAE ( ) for CA Housing being a regression problem.
Researcher Affiliation	Collaboration	1CSEM, Neuchâtel, Switzerland 2EPFL, Lausanne, Switzerland. Correspondence to: Pedram Pad <EMAIL>.
Pseudocode	No	The paper describes the mathematical model of the SAND layer using equations (1), (2), and (3) and explains its mechanism in paragraph text, but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code	Yes	The code to reproduce our experiments is available at https://github.com/csem/SAND
Open Datasets	Yes	Specifically, we utilized nine datasets, seven of which were used in previous studies by (Balın et al., 2019; Lemhadri et al., 2021; Yamada et al., 2020; Yasuda et al., 2023). The additional real and synthetic datasets were California Housing (Torgo, 1997) and HAR70 (Logacjov & Ustad, 2023). . . . The MSI Grain dataset. . . is available in the code repository at https://github.com/csem/SAND.
Dataset Splits	Yes	Across all experiments, we employed the Adam optimizer with a learning rate of 10 3, and we partitioned the datasets into 70-10-20 splits for training, validation, and testing, respectively.
Hardware Specification	Yes	Moreover, the experiments were executed on a machine equipped with an NVIDIA Ge Force RTX 4090 GPU with 24GB of RAM, paired with an AMD Ryzen 9 5900X 12-Core Processor featuring 24 threads.
Software Dependencies	No	The paper mentions the 'Adam optimizer' but does not specify version numbers for any software libraries, programming languages, or other dependencies required to replicate the experiment.
Experiment Setup	Yes	Across all experiments, we employed the Adam optimizer with a learning rate of 10 3. . . . For hyperparameters of the SAND layer, we used σ = 1.5 and α = 2 consistently. Unless otherwise specified, we selected k = 60 features for all datasets by default, except for the following: k = 5 for the Madelon dataset, k = 3 for the CA Housing dataset, and k = 6 for the Har70 dataset. . . . Table 3 containing details about all datasets utilized in the feature selection experiments. Additionally, the table includes the epochs employed during training for each dataset. . . and batch size used for training.