DICE: Data Influence Cascade in Decentralized Learning
Authors: Tongtian Zhu, Wenhao Li, Can Wang, Fengxiang He
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the experimental results, with implementation details outlined in Appendix D.1. We evaluate the alignment between one-hop DICE-GT (see Definition 2) and its first-order approximation, one-hop DICE-E (see Proposition 1). ... Anomaly Detection DICE identifies malicious neighbors, referred to as anomalies, by evaluating their proximal influence... Influence Cascade The topological dependency of DICE-E in our theory reveals the power asymmetries... |
| Researcher Affiliation | Academia | Tongtian Zhu , Wenhao Li & Can Wang Zhejiang University EMAIL Fengxiang He University of Edinburgh EMAIL |
| Pseudocode | Yes | Algorithm 1 Decentralized Learning with Flexible Gossip and Optimization Require: G = (V, E), {θ0 k}k V, optimizer Ok, number of communication rounds T, and mixing matrix distributions Wt ( t [T]) 1: for t = 1 to T do in parallel for all participants k V 2: Local Update: 3: Sample zt k Dk, update parameters with optimizer Ok: θ t+ 1 2 k Ok(θt k, zt k) 4: Gossip Averaging: 5: Send θ t+ 1 2 k to {l | Wl,k > 0} and receive θ t+ 1 2 j from {j | Wk,j > 0}. 6: Sample W t Wt, perform gossip averaging: θt+1 k P j Nin(k) W t k,jθ t+ 1 2 j End for |
| Open Source Code | No | Project page is available at DICE. ... The code will be made publicly available. |
| Open Datasets | Yes | We employ the vanilla mini-batch Adapt-Then-Communicate version of Decentralized SGD ((Lopes & Sayed, 2008), see Algorithm 1) with commonly used network topologies (Ying et al., 2021) to train three-layer MLPs (Rumelhart et al., 1986), three-layer CNNs (Le Cun et al., 1998), and Res Net-18 (He et al., 2016) on subsets of MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015). |
| Dataset Splits | No | Each node uses a 512-sample subset of CIFAR-10. Models are trained for 5 epochs with a batch size of 128 and a learning rate of 0.1. (from Figure 4 caption). The paper mentions subsets of data per node but not overall train/test/validation splits for the datasets. |
| Hardware Specification | Yes | The experiments are conducted on a computing facility equipped with 80 GB NVIDIA A100 GPUs. |
| Software Dependencies | No | We employ the vanilla mini-batch Adapt-Then-Communicate version of Decentralized SGD ((Lopes & Sayed, 2008), see Algorithm 1) with commonly used network topologies (Ying et al., 2021) to train three-layer MLPs (Rumelhart et al., 1986), three-layer CNNs (Le Cun et al., 1998), and Res Net-18 (He et al., 2016) on subsets of MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015). The paper mentions algorithms, models, and datasets but does not specify software libraries with version numbers. |
| Experiment Setup | Yes | The number of participants (one GPU as a participant) is set to 16 and 32, with each participant holding 512 samples. For sensitivity analysis, we evaluate the stability of results under hyperparameter adjustments. The local batch size is varied as 16, 64, and 128 per participant, while the learning rate is set as 0.1 and 0.01 without decay. |