Addressing Label Shift in Distributed Learning via Entropy Regularization

Authors: Zhiyuan Wu, Changkyu Choi, Xiangcheng Cao, Volkan Cevher, Ali Ramezani-Kebrya

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS, outperforming baselines by up to 20% in imbalanced settings. These results highlight the significant improvements VRLS offers in addressing label shifts. Our theoretical analysis further supports this by establishing high-probability bounds on estimation errors.
Researcher Affiliation Academia Zhiyuan Wu University of Oslo EMAIL Changkyu Choi UiT The Arctic University of Norway EMAIL Xiangcheng Cao EPFL EMAIL Volkan Cevher LIONS, EPFL EMAIL Ali Ramezani-Kebrya Department of Informatics, University of Oslo Norwegian Centre for Knowledge-driven Machine Learning (Integreat) EMAIL
Pseudocode Yes Algorithm 1 VRLS Importance Ratio Estimation Algorithm... Algorithm 2 IW-ERM with VRLS in Distributed Learning
Open Source Code Yes The code is available at https://github.com/zhiyuan-11/VRLS_main/tree/main.
Open Datasets Yes Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS... We ensured reproducibility with publicly available datasets (MNIST, CIFAR-10) and standard models (e.g., Res Net-18).
Dataset Splits Yes For experiments in a federated learning setting, both MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) datasets are employed, each containing 60,000 training samples and 10,000 test samples... The CIFAR-10 dataset (Krizhevsky) comprises 60,000 colored images, sized 32 by 32 pixels, spread across 10 classes with 6,000 images per class; it is divided into 50,000 training images and 10,000 test images.
Hardware Specification Yes All experiments are run on a single GPU within an internal cluster... Experiments were run on NVIDIA 3090, A100 GPUs, and Google Colab, with average results and variances reported across multiple trials.
Software Dependencies No The paper does not provide specific software versions for libraries (e.g., PyTorch, TensorFlow) or Python versions, only mentioning 'Adam optimizer'.
Experiment Setup Yes Stochastic gradients for each client are calculated with a batch size of 64 and aggregated on the server using the Adam optimizer. Le Net is used for experiments on MNIST and Fashion MNIST with a learning rate of 0.001 and a weight decay of 1e-6. For CIFAR-10, Res Net-18 is employed with a learning rate of 0.0001 and a weight decay of 0.0001... The regularization coefficient ΞΆ in Equation (4) is set to 1 for all experiments.