reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semi-parametric Learning of Structured Temporal Point Processes

Authors: Ganggang Xu, Ming Wang, Jiangze Bian, Hui Huang, Timothy R. Burch, Sandro C. Andrade, Jingfei Zhang, Yongtao Guan

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Asymptotic properties of the proposed estimators are investigated, and the effectiveness of our procedures is illustrated through a simulation study and an application to a stock trading dataset. Section 5 demonstrates the efficacy of the proposed methods through a simulation study. Section 6 applies the proposed method to the stock trading dataset.
Researcher Affiliation	Academia	Ganggang Xu EMAIL Ming Wang EMAIL Department of Management Science University of Miami Coral Gables, FL 33146, USA Jiangze Bian EMAIL School of Banking and Finance University of International Business and Economics Beijing, 100871, P. R. China Hui Huang EMAIL School of Mathematics Sun Yat-Sen University Guangzhou, 510275, P. R. China Timothy R. Burch EMAIL Sandro C. Andrade EMAIL Department of Finance University of Miami Coral Gables, FL 33146, USA Jingfei Zhang EMAIL Yongtao Guan EMAIL Department of Management Science University of Miami Coral Gables, FL 33146, USA
Pseudocode	No	The paper describes mathematical models, estimation procedures, and theoretical properties. It details steps like calculating marginal intensity functions, estimating covariance functions, and predicting principal component scores through equations and descriptive text, but it does not include a clearly labeled pseudocode or algorithm block with structured steps.
Open Source Code	No	The paper states: "All simulation runs were carried out in the software R on a cluster of 100 Linux machines with a total of 100 CPU cores, with each core running at approximately 2 GFLOPS." However, it does not explicitly provide a link to the source code for the methodology described in the paper, nor does it state that the code is available in supplementary materials or upon request.
Open Datasets	No	As an example, we analyze a dataset that draws from stock trading transactions (recorded at the second level) by more than 300,000 Chinese trading accounts over approximately three years. The paper describes a specific private stock trading dataset from a brokerage house, but no public access information (link, DOI, repository, or citation to a publicly available version) is provided.
Dataset Splits	Yes	The cross-validation score can then be deﬁned as ... The optimal bandwidth ˆhx is then chosen by minimizing CVX(h). Following the same procedure, we can deﬁne the cross-validation score CVY (h) using data aggregated over all accounts, NY 1 , , NY m, as deﬁned in Section 2.5 and choose the optimal ˆhy accordingly. In our analysis, we have removed accounts with fewer than 30 or more than 2,000 total transactions, which results in retaining approximately 47% of the accounts in the broader dataset, or equivalently, a total of 157, 203 accounts.
Hardware Specification	No	All simulation runs were carried out in the software R on a cluster of 100 Linux machines with a total of 100 CPU cores, with each core running at approximately 2 GFLOPS.
Software Dependencies	No	All simulation runs were carried out in the software R on a cluster of 100 Linux machines with a total of 100 CPU cores, with each core running at approximately 2 GFLOPS. The paper mentions the use of 'R' software but does not specify a version number or any other software dependencies with their versions.
Experiment Setup	Yes	For the univariate log-Gaussian process, the data are simulated from the univariate point processes model presented in Section 2.1 using the following intensity model: λij(t) = λ0(t) exp[P k=1 ξX ikφX k (t) + P k ξY jkφY k (t) + P k ξZ ijkφZ k (t)], t [0, 1], for i = 1, , n, j = 1, , m. We set p X = p Y = p Z = 2, λ0(t) = 0.3 cos(2πt) + 1 and simulate principal component scores ξX ik, ξY jk and ξZ ijk as independent normal random variables with means 0 and variances equal to the eigenvalues ηX k , ηY k and ηZ k , k = 1, 2, respectively. ... The Epanechnikov kernel function is used in all simulation studies and summary statistics based on B = 500 simulation runs are calculated. The bandwidths ˆhx, ˆhy, and ˆhz are selected by the data-driven procedure proposed in Section 2.7.