Efficient Heterogeneity-Aware Federated Active Data Selection
Authors: Ying-Peng Tang, Chao Ren, Xiaoli Tang, Sheng-Jun Huang, Lizhen Cui, Han Yu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 11 benchmark datasets demonstrate significant improvements of FALE over existing state-of-the-art methods. ... We plot the mean learning curves over 10 runs for the compared methods in Fig. 2. The mean and standard deviation of MSE on the test set are reported. |
| Researcher Affiliation | Academia | 1College of Computing and Data Science, Nanyang Technological University, Singapore 2School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Sweden 3College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 4School of Software, Shandong University, Jinan, China. Correspondence to: Han Yu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 The FALE Algorithm ... Algorithm 2 FALE-local Algorithm |
| Open Source Code | No | The paper states: "We implement the regression model using Py Torch... The implementation of LOGO (Kim et al., 2023) is sourced from the authors. The FL framework is built upon the Fed Lab (Dun Zeng & Xu, 2021) toolbox." However, it does not provide an explicit link or statement for the availability of the source code for the FALE methodology described in this paper. |
| Open Datasets | Yes | We employ 9 UCI (Dua & Graff, 2017) and Open ML (Bischl et al., 2021) regression benchmarks in our experiments. ... Celeb A is a facial image dataset ... Following Lyu et al. (2025)... IMDB-WIKI (Rothe et al., 2018) is a facial image dataset with age annotations. We adopt the dataset settings from Yang et al. (2021)... |
| Dataset Splits | Yes | For each dataset, we uniformly sample 20% of the instances for testing, while the remaining instances are distributed across k = 10 clients in a non-i.i.d. manner... To simulate the non-i.i.d. setting in regression task, we perform binning on the regression target vector, with the number of bins equal to the number of clients. We then adopt the Dirichlet distribution strategy (Yurochkin et al., 2019) with a Dirichlet alpha of 5 to assign instances to clients using the bins as a class label. ... For each dataset, 5% of each client s data is uniformly sampled to form the initial labeled set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. It mentions using Py Torch for implementation and Fed Lab for the FL framework, but no specific details about CPUs, GPUs, or other computational resources are provided. |
| Software Dependencies | No | The paper states: "We implement the regression model using Py Torch... The FL framework is built upon the Fed Lab (Dun Zeng & Xu, 2021) toolbox." It mentions PyTorch and FedLab but does not provide specific version numbers for these or any other key software components. |
| Experiment Setup | Yes | The model is trained by optimizing the mean squared error (MSE) with the Adam optimizer, using a learning rate of 0.01 for 25 epochs. For local active regression methods, we utilize the implementation provided by Holzm uller et al. (2023). ... At each iteration, we allocate a query budget of 5 instances per client, resulting in a total of 50 instances queried per round. |