reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Private Multi-Task Learning: Formulation and Applications to Federated Learning

Authors: Shengyuan Hu, Steven Wu, Virginia Smith

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we ﬁnd that our method provides improved privacy/utility trade-oﬀs relative to global baselines across common federated learning benchmarks. Finally, we explore the performance of our approach on common federated learning benchmarks (Section 5). Our results show that we can retain the accuracy beneﬁts of MTL in these settings relative to global baselines while still providing meaningful privacy guarantees. We empirically evaluate our private MTL solver on common federated learning benchmarks Caldas et al. (2018). We ﬁrst demonstrate the superior privacy-utility trade-oﬀthat exists when training our private MTL method compared with training a single global model (Section 5.2). We also compare our method with simple ﬁnetuning exploring the results of performing local ﬁnetuning after learning an MTL objective vs. a global objective (Section 5.3).
Researcher Affiliation	Academia	Shengyuan Hu EMAIL Carnegie Mellon University Zhiwei Steven Wu EMAIL Carnegie Mellon University Virginia Smith EMAIL Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 PMTL: Private Mean-Regularized MTL 1: Input: m, T, λ, η, {w0 1, , w0 m}, ew0 = 1 m Pm k=1 w0 k 2: for t = 0, , T 1 do 3: Global Learner randomly selects a set of tasks St and broadcasts the mean weight ewt 4: for k St in parallel do 5: Each client updates its weight wk for E iterations, ok is the last iteration task k is selected 6: Each client sends gt+1 k = wt+1 k wt k back to the global learner. 7: end for 8: Global Learner computes a noisy aggregator of the weights ewt+1 = ewt + 1 \|St\| k St gt+1 k min 1, γ gt+1 k 2 + N(0, σ2Id d) 9: end for 10: Output w1, , wm as diﬀerentially private personalized models 11: Client Update(w) 12: for j = 0, , E 1 do 13: Task learner performs SGD locally w = w η( wlk(w) + λ(w ewt)) 14: end for
Open Source Code	Yes	Our code is publicly available at: https://github.com/s-huu/PMTL
Open Datasets	Yes	We empirically evaluate our private MTL solver on common federated learning benchmarks Caldas et al. (2018). We summarize the details of the datasets and models we used in our empirical study in Table 2. Our experiments include both convex (Logistic Regression) and non-convex (CNN) loss objectives on both text (Stack Overﬂow) and image (Celeb A and FEMNIST) datasets. FEMNIST (Cohen et al., 2017; Caldas et al., 2018) 205 4-layer CNN 62-class image classiﬁcation Stack Overﬂow (tﬀ) 400 Logistic Regression 500-class tag prediction Celeb A (Liu et al., 2015; Caldas et al., 2018) 515 4-layer CNN Binary image classiﬁcation
Dataset Splits	No	The paper mentions using common federated learning benchmarks and discusses partitioning data among clients (tasks), but does not explicitly provide the training/test/validation splits for these datasets. It refers to 'test accuracy' and 'validation accuracy' implying such splits exist, but does not detail them.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions "Tensorﬂow federated" and that the code makes "use of the FL implementation from the public repo of Laguel et al. (2021) and Li et al. (2021)." However, it does not provide specific version numbers for these software components or any other libraries used.
Experiment Setup	Yes	For all experiments, we evaluate the test accuracy and privacy parameter of our private MTL solver given a ﬁxed clipping bound γ, variance of Gaussian noise σ2, and communication rounds T. All experiments are performed on common federated learning benchmarks as a natural application of multi-task learning. We provide a detailed description of datasets and models in Appendix A.4. In all our experiments, we subsample 100 diﬀerent tasks for each round, i.e. q = 100, to perform local training as well as involved in global aggregation. For FEMNIST and Celeb A, we choose σ {0.02, 0.05, 0.1} and γ {0.2, 0.5, 1}. For Stack Overﬂow, we choose σ {0.01, 0.05, 0.1} and γ {0.1, 0.5, 1}. We summarize both utility and privacy performance for diﬀerent hyperparameters below.