Learn How to Query from Unlabeled Data Streams in Federated Learning

Authors: Yuchang Sun, Xinran Li, Tao Lin, Jun Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulations on image and text tasks show that Lea DQ advances the model performance in various FL scenarios, outperforming the benchmarking algorithms. [...] We conduct simulations to compare the Lea DQ algorithm with several representative data querying strategies. The experimental results on various image and text tasks demonstrate that Lea DQ selects samples that result in more meaningful model updates, leading to improved model accuracy.
Researcher Affiliation Academia 1Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology 2Westlake University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: The Lea DQ framework
Open Source Code Yes 1The code is available at https://github.com/hiyuchang/leadq/.
Open Datasets Yes We evaluate the algorithms on two image classification tasks, i.e., SVHN (Netzer et al. 2011) and CIFAR-100 (Krizhevsky and Hinton 2009), and one text classification task, i.e., 20Newsgroup (Lang 1995).
Dataset Splits No The paper discusses allocating training data to clients and computing model accuracy on test data, but does not provide specific percentages, counts, or references to predefined train/validation/test splits for the overall datasets. For example, it says 'The model accuracy is computed on the test data.' but not how the test data is defined or split from the total.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions models like 'Res Net-18' and 'Distil BERT' and algorithms like 'Fed Avg' and 'QMIX', but does not list any specific software libraries or tools with their version numbers.
Experiment Setup Yes We simulate an FL system with one server and K = 10 clients. [...] In each round, Nu = 10 unlabeled data samples arrive at each client independently and each client selects Nq = 1 data sample for label querying. [...] To simulate the non-IID setting, we allocate the training data to clients according to the Dirichlet distribution with concentration parameter α = 0.5 (Li et al. 2022). [...] The results are illustrated in Tables 3 and 4, respectively, in which we show the model accuracy after R = 500 rounds when applying different data querying strategies.