Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis

Authors: Yuanxing Chen, Qingzhao Zhang, Shuangge Ma, Kuangnan Fang

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies, including simulation in Section 4 and data analysis in Section 5, demonstrate the practical utilization and superiority of the proposed approach. ... We conduct abundant simulations to gauge the performance of the proposed approach. ... In this section, we apply the proposed method... to a bank website logs data, which is stored in multiple interfaces (clients).
Researcher Affiliation Academia Yuanxing Chen yxchen EMAIL Department of Statistics and Data Science, School of Economics Xiamen University Xiamen, 361005, China. Qingzhao Zhang EMAIL Department of Statistics and Data Science, School of Economics The Wang Yanan Institute for Studies in Economics Xiamen University Xiamen, 361005, China. Shuangge Ma EMAIL Department of Biostatistics Yale University New Haven, CT 06520, USA. Kuangnan Fang EMAIL Department of Statistics and Data Science, School of Economics Xiamen University Xiamen, 361005, China.
Pseudocode Yes The proposed proximal ADMM algorithm is summarized as follows. Step 1. Obtain the initial estimates with (θ0, ξ0). Step 2. At iteration t, t = 1, 2, . . . , update θt as follows. Step 2.1. Initialize ut 1,0 = θt 1,0 = θt 1 and ρ0 = 1. Step 2.2. At iteration s, s = 1, 2, . . . , compute ... Step 2.3. Repeat Step 2.2 until convergence, and set θt θt 1,s. Step 3. For 1 k < k K, update ωt kk p τ( θ(k),t θ(k ),t 2, λ2). Step 4. Update ξt proxνh 1(νθt A + ξt 1). Step 5. Repeat Steps 2 4 until convergence, and set αt proxν 1h1(θt A + ν 1ξt).
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets No Section 4 describes 'Simulation Study' where data is generated for experiments, not from publicly available datasets. Section 5 describes 'Data Application' on 'a bank website logs data' which is not stated to be publicly available, nor is a link or citation provided for its access.
Dataset Splits Yes Specifically, we randomly select 4/5 of the samples and form the training data. In this selection, the normal: abnormal ratio is retained. The remaining samples form the testing data.
Hardware Specification No For example, the analysis of one simulated data set under Example 1 with K = 32, p = 100, and 25 candidate tuning parameter values takes about 3 minutes using a desktop with standard configurations here we note that penalized fusion estimation is in general computationally more expensive. The phrase "desktop with standard configurations" is too vague and does not provide specific hardware details.
Software Dependencies No For the SK estimator, we adopt two criteria, namely the Hartigan statistic (Hartigan, 1975) and gap statistic (Tibshirani et al., 2001), to choose the number of clusters this is realized using R package sparcl; and The corresponding two variants are referred to as SK(har) and SK(gap), respectively. For the CFL estimator, we separately analyze one-shot CFL (OCFL) and iterative CFL with multiple rounds (ICFL), where the number of clusters is specified as the true value for them. Here, both ICFL and OCFL correspond to Algorithm 2 of Ghosh et al. (2020), but the former sets the number of communication rounds as R = 100, while the latter sets R = 1. No version numbers are provided for the mentioned R packages or for the R language itself, nor for the 'Skip-gram model (which is a popular model of word2vec)' mentioned in Section 5.
Experiment Setup Yes Tuning parameter selection Following the literature, we set ν = 1 and the concavity related parameter τ = 3. Following Yang et al. (2019), we select λ1 and λ2 by minimizing the modified BIC defined as m BIC(λ1, λ2) = 1 N P K k=1 nh bθ(k)(λ1, λ2) i e V(k) h bθ(k)(λ1, λ2) i 2 h bθ(k)(λ1, λ2) i eζ(k) + CN bq(λ1, λ2), where bq(λ1, λ2) is the number of nonzero distinct coefficient vectors, and CN is a positive constant depending on N. Following Ma and Huang (2017), we adopt CN = log(log(Kp)), which can automatically adapt to a diverging number of parameters.