reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SPONGE: Competing Sparse Language Representations for Effective Knowledge Transfer

Authors: Jens-Michalis Papaioannou, Alexei Figueroa, Conor Fallon, Anna Capilla, Alexandra Bekiaridou, Stavros Zanos, Wolfgang Nejdl, Alexander Löser

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train and evaluate SOTA architectures strictly following clinical data constraints, i.e. using model checkpoints rather than data as the medium of knowledge transfer (see Figure 1 right). Additionally, we focus on assessing sequenceagnostic performance, reflecting the realistic scenario in which clinics are unaware of the data seen by publicly available models. We validate this extensively by analyzing both single and multi-step transfer learning scenarios.
Researcher Affiliation	Academia	Jens-Michalis Papaioannou EMAIL Berlin University of Applied Sciences and Technology Leibniz University Hannover; Alexei Figueroa Rosero EMAIL Berlin University of Applied Sciences and Technology Leibniz University Hannover; Conor Fallon EMAIL Berlin University of Applied Sciences and Technology; Anna Capilla EMAIL Independent Researcher, Berlin, Germany; Alexandra Bekiaridou EMAIL Feinstein Institutes for Medical Research, Northwell Health; Stavros Zanos EMAIL Feinstein Institutes for Medical Research, Northwell Health; Wolfgang Nejdl EMAIL Leibniz University Hannover; Alexander Löser EMAIL Berlin University of Applied Sciences and Technology
Pseudocode	No	The paper describes the model architecture and steps mathematically and textually in Section 5.1, but does not present a formal pseudocode or algorithm block.
Open Source Code	Yes	We make all source code2 available. 2Download the repository at https://anonymous.4open.science/r/HSPONGE-6DDD
Open Datasets	Yes	MIMIC-IV (M) (Johnson et al., 2021; 2023) is an English clinical dataset... Codi Esp (C) contains clinical case studies in Spanish... (Miranda-Escalada et al., 2020). AHEPA-Cardio (A) are cardiology discharge summaries in Greek (Papaioannou et al., 2022). Sem Clin Br (B) comprises Portuguese clinical notes... (Oliveira et al., 2022). Stockholm University Gastrointestinal (S)(Lamproudis et al., 2023) consists of EHRs... Zero-Shot Datasets: Dis TEMIST. We use the dataset of Miranda-Escalada et al. (2022) for zero-shot evaluation.
Dataset Splits	Yes	For all datasets, we use stratified sampling (Sechidis et al., 2011) to create train, validation, and test splits (see Table 5 in Appendix A). Table 5: Dataset splits. Dataset Train Val Test (M) MIMIC (Johnson et al., 2023) 102,199 9,358 7,618 (A) Achepa (Papaioannou et al., 2022) 1,590 407 394 (S) Stockholm University (Lamproudis et al., 2023) 1,583 256 232 (C) Codie Esp (Miranda-Escalada et al., 2020) 656 158 182 (B) Semclin Br (Oliveira et al., 2022) 453 107 109 (D) Dis TEMIST (Miranda-Escalada et al., 2022) 3,073 (439 7)
Hardware Specification	Yes	SPONGE and S-Protoxlm-r use the same hidden dimensions for the prototypical layer, XLM-R as a Transformer, and are trained on four A100 GPUs.
Software Dependencies	No	The paper mentions models and frameworks like XLM-R, adapter-based PEFT, and Euro LLM, but it does not specify the version numbers for underlying software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	We use subnetwork exploration for 10 epochs to build distributed knowledge of MIMIC, with a batch size of 8 and learning rate as per Pfeiffer et al. (2020).