reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Differentiable Rank-Based Objective for Better Feature Learning

Authors: Krunoslav Lehman Pavasovic, Giulio Biroli, Levent Sagun

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate dif FOCI on increasingly complex problems ranging from basic variable selection in toy examples to saliency map comparisons in convolutional networks. We then show how dif FOCI can be incorporated in the context of fairness to facilitate classifications without relying on sensitive data. In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in Azadkia & Chatterjee (2021). ... We evaluate dif FOCI on increasingly complex problems ranging from basic variable selection in toy examples to saliency map comparisons in convolutional networks. ... Section 5, we highlight its wide applicability to real-world data, showing it achieves state-of-the-art performance on feature selection and dimensionality reduction, and competitive performance in domain shift and fairness literature.
Researcher Affiliation	Collaboration	Krunoslav Lehman Pavasovic Meta FAIR, Paris Giulio Biroli ENS Paris Levent Sagun Meta FAIR, Paris
Pseudocode	Yes	Algorithm 1 FOCI Algorithm 2 Differentiable FOCI (dif FOCI)
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions a useful codebase for benchmarking from other authors but no explicit repository link, code release statement, or code in supplementary materials for dif FOCI itself.
Open Datasets	Yes	Environments. We evaluate our methods on five UCI datasets (Dua & Graff, 2017): Breast Cancer Wisconsin (Street et al., 1993), involving benign/malignant cancer prediction; Toxicity (Gul et al., 2021), aimed at determining the toxicity of molecules affecting circadian rhythms; Spambase (Hopkins et al., 1999), classifying emails as spam or not; QSAR (Ballabio et al., 2019), a set containing molecular fingerprints used for chemical toxicity classification, and Religious (Sah & Fokou e, 2019), aimed at identifying the source of religious books texts. ... We use Waterbirds dataset (Sagawa et al., 2019), which combines bird photographs from the Caltech-UCSD Birds-2002011 dataset (Wah et al., 2011) with image backgrounds from the Places dataset (Zhou et al., 2017). ... Apart from Waterbirds dataset, we also tested dif FOCI on 5 additional datasets: two text datasets: Multi NLI (Williams et al., 2017), Civil Comments (Borkan et al., 2019), and four image datasets: NICO++ (Zhang et al., 2023), Celeb A (Liang & Zou, 2022), Meta Shift (Liang & Zou, 2022) and Che Xpert (Irvin et al., 2019). ... We utilize classification datasets with interpretable features and sensitive attributes: (i) Student dataset (Cortez & Silva, 2008); (ii) Bank Marketing dataset (Moro et al., 2014); and two ACS datasets (Ding et al., 2021).
Dataset Splits	Yes	All input data is standardized, and across all benchmarks, we perform a (75-15-10)% train-validation-test split. ... We employ standard dataset splits from prior work (Idrissi et al., 2022) and note that the dataset is licensed under the Creative Commons Attribution 4.0 International license. ... We adopt the standard splits provided by the WILDS benchmark (Koh et al., 2021). ... We use standard train/val/test splits from prior work (Idrissi et al., 2022). ... For each attribute-label pair, we allocate 25 samples for validation and 50 samples for testing, while using the remaining data as training examples. ... The dataset is randomly divided into 85-15% train-validation splits for all datasets, which are used to train a classifier to predict whether a patient has no finding.
Hardware Specification	Yes	While performing hyperaparameter search, each experiment is run on one Nvidia Tesla V100 GPU.
Software Dependencies	No	The paper mentions using Py Torch (Paszke et al., 2017), scikit-learn (Pedregosa et al., 2011), and the Adam optimizer (Kingma & Ba, 2017) but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Alg. 2 therefore requires the following hyperparameters: softmax temperature β, cutoff value υ, and optimization parameters (e.g., learning rate γ, weight decay λ, minibatch size b, etc.). Our experimental analyses show that β = 5 and υ = 0.1 yield consistently good performance, so we set these as fixed. ... For the first two toy examples, we use a one-hidden-layer MLP with a configuration of 10-20-10 neurons. In contrast, the third example employs a two-hidden-layer MLP structured as 10-20-20-10 neurons. For all benchmarks using vec-(d F1) and vec-(d F3), we initialize the parameter θ from a θ N(1, σ2 Ip), with σ2 = 0.1. ... We adjust the learning rate and weight decay from the set 10-4, 10-3, 10-2, 10-1, 5-4, 5-3, 5-2 11. The number of epochs is optimized within the range {10, 20, 50, 100}, and batch sizes are chosen from {8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}. ... The value of η for g DRO is set to 0.1.