Learn How to Query from Unlabeled Data Streams in Federated Learning
Authors: Yuchang Sun, Xinran Li, Tao Lin, Jun Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulations on image and text tasks show that Lea DQ advances the model performance in various FL scenarios, outperforming the benchmarking algorithms. [...] We conduct simulations to compare the Lea DQ algorithm with several representative data querying strategies. The experimental results on various image and text tasks demonstrate that Lea DQ selects samples that result in more meaningful model updates, leading to improved model accuracy. |
| Researcher Affiliation | Academia | 1Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology 2Westlake University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: The Lea DQ framework |
| Open Source Code | Yes | 1The code is available at https://github.com/hiyuchang/leadq/. |
| Open Datasets | Yes | We evaluate the algorithms on two image classification tasks, i.e., SVHN (Netzer et al. 2011) and CIFAR-100 (Krizhevsky and Hinton 2009), and one text classification task, i.e., 20Newsgroup (Lang 1995). |
| Dataset Splits | No | The paper discusses allocating training data to clients and computing model accuracy on test data, but does not provide specific percentages, counts, or references to predefined train/validation/test splits for the overall datasets. For example, it says 'The model accuracy is computed on the test data.' but not how the test data is defined or split from the total. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions models like 'Res Net-18' and 'Distil BERT' and algorithms like 'Fed Avg' and 'QMIX', but does not list any specific software libraries or tools with their version numbers. |
| Experiment Setup | Yes | We simulate an FL system with one server and K = 10 clients. [...] In each round, Nu = 10 unlabeled data samples arrive at each client independently and each client selects Nq = 1 data sample for label querying. [...] To simulate the non-IID setting, we allocate the training data to clients according to the Dirichlet distribution with concentration parameter α = 0.5 (Li et al. 2022). [...] The results are illustrated in Tables 3 and 4, respectively, in which we show the model accuracy after R = 500 rounds when applying different data querying strategies. |