reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Confounder-Free Continual Learning via Recursive Feature Normalization

Authors: Yash Shah, Camila Gonzalez, Mohammad H. Abbasi, Qingyu Zhao, Kilian M. Pohl, Ehsan Adeli

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that R-MDN promotes equitable predictions across population groups, both within static learning and across different stages of continual learning, by reducing catastrophic forgetting caused by confounder effects changing over time. We provide a theoretical foundation for our approach (Section 3) and empirically validate it in different experimental setups (Sections 4.1, 4.2, 4.3). We find that R-MDN helps to make equitable predictions for population groups not only within a single cross-sectional study (Section 4.4), but also across different stages of training during continual learning (Sections 4.1, 4.2, 4.3), by minimizing catastrophic forgetting of confounder effects over time.
Researcher Affiliation	Academia	1Stanford University, Stanford, United States 2Weill Cornell Medicine, New York, United States.
Pseudocode	No	The paper includes mathematical derivations for parameter updates in Appendix A, but these are not presented as structured pseudocode or an algorithm block. For example, it shows equations for Q(N+1) and R(N+1), but not a step-by-step algorithm.
Open Source Code	Yes	1The implementation code is available at https://github.com/stanfordtailab/RMDN.git.
Open Datasets	Yes	Next, we classify 2D dermatoscopic images of pigmented skin lesions into seven distinct diagnostic categories with the HAM10000 dataset (Tschandl et al., 2018). The Alzheimer s Disease Neuroimaging Initiative is a multi-center observational study that collects neuroimaging data from participants that fall into different diagnostic groups over several years (Mueller et al., 2005; Petersen et al., 2010). We used T1w MRIs from the ABCD (Adolescent Brain Cognitive Development) study (Casey et al., 2018) for the task of binary sex classification. Data is anonymized and curated, and is released annually to the research community through the NIMH Data Archive (see data sharing information at https://abcdstudy.org/scientists/data-sharing/). The ABCD data used in this report came from release 5.0, with DOI 10.15154/8873-zj65.
Dataset Splits	Yes	For each stage, we randomly allocate 80% of the images for training and the remaining 20% for testing. To evaluate the models, we perform 5 runs of 5-fold cross validation across different model initialization seeds, with images split by subject and site ID, and having approximately an equal number of boys and girls in each fold.
Hardware Specification	Yes	All experiments were run on a single NVIDIA Ge Force RTX 2080 Ti with 11GB memory size and 8 workers on an internal cluster.
Software Dependencies	No	The paper mentions optimizers like Adam (Kingma & Ba, 2014) and Adam W (Loshchilov & Hutter, 2017) but does not provide specific version numbers for these or other key software components (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Models were trained for 100 epochs with different batch sizes. Parameters of the R-MDN model were optimized using Adam (Kingma & Ba, 2014), with a learning rate initialization of 0.0001 that decayed by 0.8 times every 20 epochs. The regularization parameter for R-MDN was set to 0.0001. Models were trained for 50 epochs with a batch size of 128. Parameters of the R-MDN model were optimized using Adam, with a learning rate initialization of 0.0005 that decayed by 0.7 times every 4 epochs. The regularization parameter for R-MDN was set to 0.