reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching

Authors: Nabeel Seedat, Mihaela Van Der Schaar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches, highlighting its potential to accelerate data integration and interoperability of ML-ready data. We conduct experiments on the MIMIC-OMOP and Synthea-OMOP datasets, which are the standard benchmark datasets used in prior schema matching works (Sheetrit et al., 2024; Zhang et al., 2023b; Narayan et al., 2022; Zhang et al., 2023a; 2021). These datasets are real-world healthcare schema matching datasets and have been widely adopted due to their complexity and their reflection of real-world schema matching challenges.
Researcher Affiliation	Collaboration	Nabeel Seedat 1 2 Mihaela van der Schaar 1 1Department of Applied Mathematics and Theoretical Physics, University of Cambridge 2Foundational Machine Learning Research, Thomson Reuters. Correspondence to: Nabeel Seedat <EMAIL>.
Pseudocode	Yes	Algorithm 1 Optimize LM program L 0: Input: Set of evaluation queries Deval = e1, e2, . . . , en 0: Output: Set of top n demonstrations Ddemo ... Algorithm 3 Matchmaker: Schema Matching with Self-Improving Compositional Language Model Programs Require: Source schema Ss, Target schema St Ensure: Schema matches M
Open Source Code	Yes	2https://github.com/seedatnabeel/Matchmaker or https://github.com/vanderschaarlab/Matchmaker
Open Datasets	Yes	We conduct experiments on the MIMIC-OMOP and Synthea-OMOP datasets, which are the standard benchmark datasets used in prior schema matching works (Sheetrit et al., 2024; Zhang et al., 2023b; Narayan et al., 2022; Zhang et al., 2023a; 2021). ... Open-source data: https://github.com/meni Data1/MIMIC_2_OMOP ... Open-source data: https://github.com/JZCS2018/SMAT/tree/main/datasets/omap/
Dataset Splits	Yes	Note there is no specific train-test sets used as in supervised learning. As we perform the schema matching task in a zero-shot manner. ... In our experiments, we assess two variants given that labeled training data for schema matching is hard to access: (i) 20-80: 20% train and 80% test and (ii) 50-50: 50% train and 50% test.
Hardware Specification	Yes	All experiments are run on a single Nvidia A4000 GPU with 20 GB of vram.
Software Dependencies	Yes	The model version used as the LLM was GPT-4-1106, with the following settings: ... We use Colbert-V2 (Santhanam et al., 2022) as the embedding model ... All LLM baselines use GPT-4 (0613) (Open AI, 2023) as the backbone for fair comparison to the original works and to isolate the gains of the system not tied to the LLM.
Experiment Setup	Yes	GPT-4 Hyper-parameters. The model version used as the LLM was GPT-4-1106, with the following settings: { temperature : 0.5, max_tokens : 1024, top_p : 1, frequency_penalty : 0, presence_penalty : 0, n : 1, }