Distributed Statistical Inference under Heterogeneity
Authors: Jia Gu, Song Xi Chen
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The purpose of this section is to examine the numerical performances of the estimators via both the simulation study in Section 6.1 and the real data analysis in Section 6.2. |
| Researcher Affiliation | Academia | Jia Gu EMAIL Center for Statistical Science Peking University Bejing, China; Song Xi Chen EMAIL School of Mathematical Science, Guanghua School of Management and Center for Statistical Science, Peking University Beijing, China |
| Pseudocode | Yes | The procedure to obtain the weighted distributed estimator is summarized in Algorithm 1. Input: Distributed datasets: {Xk,i, k = 1, ..., K; i = 1, ..., nk} Output: Weighted distributed estimator: ˆφWD 1 In each data block k (k = 1, 2, , K): 2 Solve (2) and obtain ˆθk = (ˆφk, ˆλk) ; 3 Calculate b Hk(ˆθk), which is the leading principal sub-matrix of order p1 of ( θk bΨθk) 1(n 1 k Pnk i=1 Z(Xk,i; ˆθk))( θk bΨθk) T , where Z(x, θk) is defined in Assumption 6 and bΨθk = n 1 k Pnk i=1 ψθk(Xk,i; ˆθk); 4 In a central server: 5 Collect (ˆφk, b Hk(ˆθk) 1) from all the K data blocks; 6 Calculate ˆφ = PK k=1 nk b Hk(ˆθk) 1 1 PK k=1 nk( b Hk(ˆθk)) 1 ˆφk ; 7 ˆφWD = ˆφI(ˆφ Φ) + ˆφSa CI(ˆφ Φ), where ˆφSa C = N 1 PK k=1 nk ˆφk. Algorithm 1: Weighted Distributed estimator |
| Open Source Code | No | The paper does not contain any explicit statements about making the code open source, nor does it provide links to a code repository or mention code in supplementary materials for the methodology described. |
| Open Datasets | Yes | The flight data are available from https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2009 and the weather data are obtained from https://cds.climate.copernicus.eu/. |
| Dataset Splits | Yes | We segmented the full data of N = 2412782 according to the airports of departing flights and obtained 10 data segments. For each segment, we split it to data blocks of size n = 5000, while the residual data blocks were discarded, such that the total number of blocks K = 479. |
| Hardware Specification | Yes | Throughout the simulation experiments, the results of each simulation setting were based on B = 500 number of replications and were conducted in R with a 10-core Intel(R) Core(TM) i9-10900K @3.7 GHz processor. |
| Software Dependencies | No | Throughout the simulation experiments, the results of each simulation setting were based on B = 500 number of replications and were conducted in R with a 10-core Intel(R) Core(TM) i9-10900K @3.7 GHz processor. This only specifies 'R' without a version number or any other software packages with versions. |
| Experiment Setup | Yes | For each of K data blocks with K {10, 50, 100, 250, 500, 1000, 2000}, {(Xk,i; Yk,i)}n i=1 Rp {0, 1} were independently sampled from the following model: Xk,i N(0p 1, 0.752Ip p) and P(Yk,i = 1 | Xk,i) = exp(XT k,iθ k) 1 + exp(XT k,iθ k), where θ k = (φ T , λ T k )T , φ = 1, λ k = (λ k,1, λ k,2, , λ k,p2)T and λ k,j = ( 1)j10(1 2(k 1)/(K 1)). The sample sizes of the data blocks were equal at n = NK 1 with N = 2 106. Two levels of the dimension p2 = 4 and 10 of the nuisance parameter λk were considered. |