reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Statistical Inference under Heterogeneity

Authors: Jia Gu, Song Xi Chen

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The purpose of this section is to examine the numerical performances of the estimators via both the simulation study in Section 6.1 and the real data analysis in Section 6.2.
Researcher Affiliation	Academia	Jia Gu EMAIL Center for Statistical Science Peking University Bejing, China; Song Xi Chen EMAIL School of Mathematical Science, Guanghua School of Management and Center for Statistical Science, Peking University Beijing, China
Pseudocode	Yes	The procedure to obtain the weighted distributed estimator is summarized in Algorithm 1. Input: Distributed datasets: {Xk,i, k = 1, ..., K; i = 1, ..., nk} Output: Weighted distributed estimator: ˆφWD 1 In each data block k (k = 1, 2, , K): 2 Solve (2) and obtain ˆθk = (ˆφk, ˆλk) ; 3 Calculate b Hk(ˆθk), which is the leading principal sub-matrix of order p1 of ( θk bΨθk) 1(n 1 k Pnk i=1 Z(Xk,i; ˆθk))( θk bΨθk) T , where Z(x, θk) is deﬁned in Assumption 6 and bΨθk = n 1 k Pnk i=1 ψθk(Xk,i; ˆθk); 4 In a central server: 5 Collect (ˆφk, b Hk(ˆθk) 1) from all the K data blocks; 6 Calculate ˆφ = PK k=1 nk b Hk(ˆθk) 1 1 PK k=1 nk( b Hk(ˆθk)) 1 ˆφk ; 7 ˆφWD = ˆφI(ˆφ Φ) + ˆφSa CI(ˆφ Φ), where ˆφSa C = N 1 PK k=1 nk ˆφk. Algorithm 1: Weighted Distributed estimator
Open Source Code	No	The paper does not contain any explicit statements about making the code open source, nor does it provide links to a code repository or mention code in supplementary materials for the methodology described.
Open Datasets	Yes	The ﬂight data are available from https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2009 and the weather data are obtained from https://cds.climate.copernicus.eu/.
Dataset Splits	Yes	We segmented the full data of N = 2412782 according to the airports of departing ﬂights and obtained 10 data segments. For each segment, we split it to data blocks of size n = 5000, while the residual data blocks were discarded, such that the total number of blocks K = 479.
Hardware Specification	Yes	Throughout the simulation experiments, the results of each simulation setting were based on B = 500 number of replications and were conducted in R with a 10-core Intel(R) Core(TM) i9-10900K @3.7 GHz processor.
Software Dependencies	No	Throughout the simulation experiments, the results of each simulation setting were based on B = 500 number of replications and were conducted in R with a 10-core Intel(R) Core(TM) i9-10900K @3.7 GHz processor. This only specifies 'R' without a version number or any other software packages with versions.
Experiment Setup	Yes	For each of K data blocks with K {10, 50, 100, 250, 500, 1000, 2000}, {(Xk,i; Yk,i)}n i=1 Rp {0, 1} were independently sampled from the following model: Xk,i N(0p 1, 0.752Ip p) and P(Yk,i = 1 \| Xk,i) = exp(XT k,iθ k) 1 + exp(XT k,iθ k), where θ k = (φ T , λ T k )T , φ = 1, λ k = (λ k,1, λ k,2, , λ k,p2)T and λ k,j = ( 1)j10(1 2(k 1)/(K 1)). The sample sizes of the data blocks were equal at n = NK 1 with N = 2 106. Two levels of the dimension p2 = 4 and 10 of the nuisance parameter λk were considered.