reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Automated Knowledge Integration From Human-Interpretable Representations

Authors: Katarzyna Kobalczyk, Mihaela van der Schaar

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate our claims, we implement an instantiation of informed meta-learning the Informed Neural Process, and empirically demonstrate the potential benefits and limitations of informed meta-learning in improving data efficiency and generalisation. Through empirical evaluation on both synthetic and real-world datasets, we demonstrate the feasibility of this approach to knowledge integration, as evidenced by improvements in predictive performance and data efficiency.
Researcher Affiliation	Academia	Katarzyna Kobalczyk University of Cambridge EMAIL Mihaela van der Schaar University of Cambridge EMAIL
Pseudocode	No	The paper describes the model architecture and training process in textual and mathematical form, particularly in Section 4 "Informed Neural Processes" and Appendix A.4 "INP MODEL ARCHITECTURE" and A.5 "INP TRAINING". However, it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks with structured, code-like steps.
Open Source Code	Yes	We also provide access the anonymous repository containing the source code together with the instructions to reproduce the experiments from sections 5.1 and 5.2.1 (code for the image classification experiments will be made available upon paper acceptance). We also release the synthetically-generated datasets with GPT-4 used in experiments 5.2.1 and 5.2.2. Code and data can be found at: https://github.com/kasia-kobalczyk/informed-metalearning and at a wider lab repository: https://github.com/vanderschaarlab/informed-meta-learning.
Open Datasets	Yes	We use the sub-hourly temperature dataset from the U.S. Climate Reference Network (USCRN)2. 2https://www.ncei.noaa.gov/access/crn/qcdatasets.html We apply INPs to few-shot classification on the CUB-200-2011 dataset Wah et al. (2011).
Dataset Splits	Yes	For each task, the number of context points n ranges uniformly between 0 and 10; the number of targets, m = 100. Training, validation, and testing collections of tasks are created by randomly selecting 507, 108, and 110 days, respectively, between the years 2021 and 2022 in Aleknagik, Alaska. For each task, the target dataset consists of all 288 measurements in the 24h range. Context observations are sampled by first uniformly sampling 10 data points and then selecting the chronologically first n observations where n U[0, 10]. We use 100 bird categories for training, 50 for validation, and 50 for testing. For each task, the number of shots k, i.e. the number of example images per class ranges uniformly between 0 and 10. The target set consists of 20 images per class.
Hardware Specification	Yes	All experiments were run on a machine with an AMD Epyc Milan 7713 CPU, 120GB RAM, and using a single NVIDIA A6000 Ada Generation GPU accelerator with 48GB VRAM.
Software Dependencies	No	The paper mentions several software components and models used (e.g., "Adam optimise r", "Ro BERTa language model", "GPT-4", "CLIP vision and text encoders", "Hugging Face implementation of the CLIP Vi T-B/32 model"). However, it does not provide specific version numbers for general programming languages or core machine learning libraries (e.g., Python, PyTorch, TensorFlow) as required.
Experiment Setup	Yes	The data encoder, hθe,D, is implemented as a 3-layer MLP. The knowledge encoder, hθe,K, is implemented with the Deep Set architecture Zaheer et al. (2017), made of two 2-layer MLPs. The decoder is a 4-layer MLP. We set the hidden dimension, d = 128 and use the sum & MLP method for the aggregator, a. We use a learning rate of 1e-3 and set the batch size to 64. During training, knowledge is masked at rate 0.3.