reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Addressing Label Shift in Distributed Learning via Entropy Regularization

Authors: Zhiyuan Wu, Changkyu Choi, Xiangcheng Cao, Volkan Cevher, Ali Ramezani-Kebrya

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS, outperforming baselines by up to 20% in imbalanced settings. These results highlight the significant improvements VRLS offers in addressing label shifts. Our theoretical analysis further supports this by establishing high-probability bounds on estimation errors.
Researcher Affiliation	Academia	Zhiyuan Wu University of Oslo EMAIL Changkyu Choi UiT The Arctic University of Norway EMAIL Xiangcheng Cao EPFL EMAIL Volkan Cevher LIONS, EPFL EMAIL Ali Ramezani-Kebrya Department of Informatics, University of Oslo Norwegian Centre for Knowledge-driven Machine Learning (Integreat) EMAIL
Pseudocode	Yes	Algorithm 1 VRLS Importance Ratio Estimation Algorithm... Algorithm 2 IW-ERM with VRLS in Distributed Learning
Open Source Code	Yes	The code is available at https://github.com/zhiyuan-11/VRLS_main/tree/main.
Open Datasets	Yes	Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS... We ensured reproducibility with publicly available datasets (MNIST, CIFAR-10) and standard models (e.g., Res Net-18).
Dataset Splits	Yes	For experiments in a federated learning setting, both MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) datasets are employed, each containing 60,000 training samples and 10,000 test samples... The CIFAR-10 dataset (Krizhevsky) comprises 60,000 colored images, sized 32 by 32 pixels, spread across 10 classes with 6,000 images per class; it is divided into 50,000 training images and 10,000 test images.
Hardware Specification	Yes	All experiments are run on a single GPU within an internal cluster... Experiments were run on NVIDIA 3090, A100 GPUs, and Google Colab, with average results and variances reported across multiple trials.
Software Dependencies	No	The paper does not provide specific software versions for libraries (e.g., PyTorch, TensorFlow) or Python versions, only mentioning 'Adam optimizer'.
Experiment Setup	Yes	Stochastic gradients for each client are calculated with a batch size of 64 and aggregated on the server using the Adam optimizer. Le Net is used for experiments on MNIST and Fashion MNIST with a learning rate of 0.001 and a weight decay of 1e-6. For CIFAR-10, Res Net-18 is employed with a learning rate of 0.0001 and a weight decay of 0.0001... The regularization coefficient ζ in Equation (4) is set to 1 for all experiments.