reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DAG-Informed Structure Learning from Multi-Dimensional Point Processes

Authors: Chunming Zhang, Muhong Gao, Shengji Jia

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, simulation studies indicate that our proposed DAG-constrained estimator, when appropriately penalized, yields more accurate graphs compared to unconstrained or unregularized estimators. Finally, we apply the proposed method to two real Mu TPP datasets. Keywords: asymptotic consistency; causal structure; constrained optimization; multivariate counting process; Structural Hamming Distance.
Researcher Affiliation	Academia	Chunming Zhang EMAIL Department of Statistics University of Wisconsin-Madison Madison, WI 53706, USA Muhong Gao EMAIL Academy of Mathematics and System Science Chinese Academy of Sciences Beijing 100190, China Shengji Jia EMAIL School of Statistics and Mathematics Shanghai Lixin University of Accounting and Finance Shanghai, China
Pseudocode	Yes	Algorithm 1 Flexible Augmented Lagrangian (Flex-AL) algorithm for solving (12) Algorithm 2 Proximal Quasi-Newton (PXQN) algorithm for solving (25)
Open Source Code	No	The paper does not contain an explicit statement about releasing its own source code, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	We analyze the neuronal spike train dataset in Fujisawa et al. (2008), comprising multineuron recordings obtained from rats performing a working memory task. This dataset, available at http://crcns.org/data-sets/pfc/pfc-2/about-pfc-2... We test our proposed method on the IPTV (Internet Protocol television) viewing record dataset, which is publicly available and accessible at https://ieee-dataport.org.
Dataset Splits	Yes	Following the cleaning process, we partition the data into two sets: the training set encompassing spikes within the time range of [100, 1600] seconds, and the testing set containing spikes between [1600, 2600] seconds... This cleaned data is split into the training set (containing timestamps in the ﬁrst 400 hours) and the testing set (containing timestamps in the following 368 hours).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud resources) used to run the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	The point process data is generated using model (4) with true covariates: xi(t) = g Ni((t φ, t])/φ , t [0, T], i = 1, . . . , d, where φ = 1, and g(x) = log{1 + min(x, 10)}. For each node j = 1, . . . , d, the true baseline parameters are set as w 0,j = 0.8. For i, j = 1, . . . , d, the true interaction parameters from node i to node j are w i,j = 0.5 for an excitatory eﬀect, w i,j = 0.5 for an inhibitory eﬀect, w i,j = 0 for no eﬀect. The total time length T is selected from grid points ranging between 300 and 1600... Both tuning parameters η1 and η2 are chosen by minimizing the Bayesian Information Criterion (BIC) function (Nishii, 1984)... For each simulation replication, both algorithms are employed using the DAG-w L1 and DAG-unreg methods to estimate Network 3. The total time length of the synthetic point process data was ﬁxed at T = 1200. To ensure a fair comparison, identical step sizes {βα, βρ, γα, γρ} were utilized in both algorithms (refer to (20), (21), (23), and (24)), speciﬁcally set to βα = βρ = γα = γρ = 5. Both algorithms adopted the same stopping rule described in Section 4.2, terminating once h(W(k)) < ϵh at a ceratin iteration step k = bk, with ϵh = 10 5.