Distributed Estimation on Semi-Supervised Generalized Linear Model

Authors: Jiyuan Tu, Weidong Liu, Xiaojun Mao

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, several simulation analyses and real data studies are provided to demonstrate the effectiveness of our method.
Researcher Affiliation Academia Jiyuan Tu EMAIL School of Mathematics Shanghai University of Finance and Economics, Shanghai, 200433, China Weidong Liu EMAIL School of Mathematical Sciences Mo E Key Lab of Artificial Intelligence Shanghai Jiao Tong University, Shanghai, 200240, China Xiaojun Mao EMAIL School of Mathematical Sciences Ministry of Education Key Laboratory of Scientific and Engineering Computing Shanghai Jiao Tong University, Shanghai, 200240, China
Pseudocode Yes Algorithm 1 Semi-Supervised Distributed Approximate NEwton Method (SSDANE) ... Algorithm 2 Semi-Supervised Distributed Approximate NEwton with Average (SSDANE-Avg)
Open Source Code No The paper does not provide explicit statements about releasing code for their methodology, nor does it include a link to a code repository or mention code in supplementary materials.
Open Datasets Yes In this section, we analyze the Celeb A dataset2 from the Kaggle website, which is included in LEAF (Caldas et al., 2018), a standard distributed learning benchmark. ... 2 https://www.kaggle.com/datasets/jessicali9530/celeba-dataset
Dataset Splits Yes We take the total sample size as 120000, and randomly partition the dataset into 20000 testing data, 20000 labeled training data, and 80000 unlabeled training data.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments. It only mentions general 'computing units' without specific models or specifications.
Software Dependencies No The paper does not list any specific software dependencies with version numbers used for the experiments.
Experiment Setup Yes Parameter Settings In both models, we assume the i.i.d. covariate vectors Xi = (Xi,1, ..., Xi,p)T are drawn from a multivariate normal distribution N(0, Σ) for i = 1, ..., N. Here the covariance matrix Σ is a p p Toeplitz matrix with its (i, j)-th entry Σij = 0.5|i j|, where 1 i, j p. We fix dimension p = 20 and the true coefficient β = (1, 0.95, 0.9, ..., 0.1, 0.05). We repeat 100 independent simulations and report the averaged estimation error and the corresponding standard error. ... Effect of the Number of Machines and Local Unlabeled Data To investigate the effect of the number of machines and local unlabeled data, we fix the labeled local sample size n to be 100, and vary the number of machines m from {20, 50, 100}, and the unlabeled local sample size n from {100, 200, 500}. ... For the choice of the initial estimator, we uniformly use the local estimator on the master machine H1. ... To solve the optimization problem (13) for the logistic regression model, we apply conjugate gradient descent motivated by Minka (2003). ... We consider three cases where (m, n) = (20, 1000),(40, 500) and (80, 250) respectively.