reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Authors: Norah Alballa, Wenxuan Zhang, Ziquan Liu, Ahmed Mohamed Abdelmoniem Sayed, Mohamed Elhoseiny, Marco Canini

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, conducted on both standard and clinical benchmarks, show that QKT significantly outperforms existing collaborative learning methods by an average of 20.91% points in single-class query settings and an average of 14.32% points in multi-class query scenarios. Further analysis and ablation studies reveal that QKT effectively balances the learning of new and existing knowledge, showing strong potential for its application in decentralized learning.
Researcher Affiliation	Academia	Norah Alballa , Wenxuan Zhang , Ziquan Liu , Ahmed M. Abdelmoniem , Mohamed Elhoseiny , Marco Canini KAUST Queen Mary University of London
Pseudocode	No	The paper describes the proposed method in Section 3.3 with detailed steps and equations, and also provides a flowchart in Figure 3. However, it does not include a formally structured pseudocode block or algorithm listing.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate our approach on image classification tasks using the following datasets: CIFAR10 (Krizhevsky, 2009), with 60,000 images across 10 classes; CIFAR100 (Krizhevsky, 2009), featuring 100 classes with 600 images per class, to test generalizability across more classes; CINIC10 (Darlow et al., 2018), which combines samples from Image Net (Russakovsky et al., 2015) and CIFAR10, introducing natural distribution shifts (Luo et al., 2021); Path MNIST (Yang et al., 2023), a medical dataset containing 9 classes of colorectal cancer images; and Blood MNIST (Yang et al., 2023), featuring 8 classes of blood cell microscope images.
Dataset Splits	Yes	In our experiments, each client issues a query to learn or improve a single class or multiple classes. We evaluate both scenarios for each client. The specific classes and, in the case of multi-class queries, the number of classes, are both selected randomly from the client s data distribution. Classes are chosen from the set of underrepresented classes in the client s data based on a predefined sample threshold (50 samples by default). The selection process follows a uniform distribution, ensuring that each eligible class has an equal probability of being selected. ... We explore using a variable E for each client, tuned with a validation set consisting of 1% of the training data for CINIC10 and 10% for other datasets.
Hardware Specification	No	The paper mentions that "For computer time, this research used the resources of the Supercomputing Laboratory at KAUST." however, it does not provide specific details such as GPU/CPU models, memory, or other detailed hardware specifications used for running experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and ResNet-18 architecture, and refers to their respective original papers. However, it does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, which are crucial for reproducibility.
Experiment Setup	Yes	For all experiments, we use the Adam optimizer (Kingma & Ba, 2017) with a learning rate of 1 10 3, a weight decay of 4 10 4, and a batch size of 32, consistent with prior studies (Meng et al., 2023; Alballa & Canini, 2023). During local training, each client s model is pre-trained on its local dataset for up to 100 epochs, with early stopping applied if validation performance does not improve for 10 consecutive epochs. In FL approaches, the number of local training epochs per communication round (E) is set to 2. For all KD-based approaches (naive KD and QKT), we use a default α parameter and temperature of 1, and use the student s data as the transfer set. Training is conducted over E epochs using the transfer set, where E is set to 25 for CIFAR10 and CINIC10, and 10 for all other datasets. To maintain simplicity, the same E is used across all clients and both phases of full QKT. For QKT Light, E is set to 5 for Phase 2.