DICE: Data Influence Cascade in Decentralized Learning

Authors: Tongtian Zhu, Wenhao Li, Can Wang, Fengxiang He

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents the experimental results, with implementation details outlined in Appendix D.1. We evaluate the alignment between one-hop DICE-GT (see Definition 2) and its first-order approximation, one-hop DICE-E (see Proposition 1). ... Anomaly Detection DICE identifies malicious neighbors, referred to as anomalies, by evaluating their proximal influence... Influence Cascade The topological dependency of DICE-E in our theory reveals the power asymmetries...
Researcher Affiliation Academia Tongtian Zhu , Wenhao Li & Can Wang Zhejiang University EMAIL Fengxiang He University of Edinburgh EMAIL
Pseudocode Yes Algorithm 1 Decentralized Learning with Flexible Gossip and Optimization Require: G = (V, E), {θ0 k}k V, optimizer Ok, number of communication rounds T, and mixing matrix distributions Wt ( t [T]) 1: for t = 1 to T do in parallel for all participants k V 2: Local Update: 3: Sample zt k Dk, update parameters with optimizer Ok: θ t+ 1 2 k Ok(θt k, zt k) 4: Gossip Averaging: 5: Send θ t+ 1 2 k to {l | Wl,k > 0} and receive θ t+ 1 2 j from {j | Wk,j > 0}. 6: Sample W t Wt, perform gossip averaging: θt+1 k P j Nin(k) W t k,jθ t+ 1 2 j End for
Open Source Code No Project page is available at DICE. ... The code will be made publicly available.
Open Datasets Yes We employ the vanilla mini-batch Adapt-Then-Communicate version of Decentralized SGD ((Lopes & Sayed, 2008), see Algorithm 1) with commonly used network topologies (Ying et al., 2021) to train three-layer MLPs (Rumelhart et al., 1986), three-layer CNNs (Le Cun et al., 1998), and Res Net-18 (He et al., 2016) on subsets of MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015).
Dataset Splits No Each node uses a 512-sample subset of CIFAR-10. Models are trained for 5 epochs with a batch size of 128 and a learning rate of 0.1. (from Figure 4 caption). The paper mentions subsets of data per node but not overall train/test/validation splits for the datasets.
Hardware Specification Yes The experiments are conducted on a computing facility equipped with 80 GB NVIDIA A100 GPUs.
Software Dependencies No We employ the vanilla mini-batch Adapt-Then-Communicate version of Decentralized SGD ((Lopes & Sayed, 2008), see Algorithm 1) with commonly used network topologies (Ying et al., 2021) to train three-layer MLPs (Rumelhart et al., 1986), three-layer CNNs (Le Cun et al., 1998), and Res Net-18 (He et al., 2016) on subsets of MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015). The paper mentions algorithms, models, and datasets but does not specify software libraries with version numbers.
Experiment Setup Yes The number of participants (one GPU as a participant) is set to 16 and 32, with each participant holding 512 samples. For sensitivity analysis, we evaluate the stability of results under hyperparameter adjustments. The local batch size is varied as 16, 64, and 128 per participant, while the learning rate is set as 0.1 and 0.01 without decay.