reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Inference for Nonparametric Hawkes Processes Using Auxiliary Latent Variables

Authors: Feng Zhou, Zhidong Li, Xuhui Fan, Yang Wang, Arcot Sowmya, Fang Chen

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments We evaluate the performance of our proposed Gibbs sampler, EM and mean-ﬁeld (MF) algorithms on both simulated and real-world data. Speciﬁcally, we compare our proposed algorithms to the following alternatives. ... Test LL: the log-likelihood of hold-out data using the trained model. This is a metric describing the model prediction ability. Est Err: the mean squared error between the estimated ˆµ(t), ˆφ(τ) and the ground truth. It is only used for simulated data. Pre Acc: given an event sequence {tn}i 1 n=1, we wish to predict the time of ti. ... Run Time: the running time of various methods w.r.t. the number of training data. 6.1 Simulated Data Experiments In simulated data experiments, we use the thinning algorithm (Ogata, 1998) to generate 100 sets of training data and 10 sets of test data ... 6.2 Real Data Experiments We compare various methods on two real-world data sets of crime.
Researcher Affiliation	Academia	1Data61 CSIRO, 13 Garden Street, Eveleigh, New South Wales, Australia 2University of New South Wales, Kensington, New South Wales, Australia 3University of Technology Sydney, Ultimo, New South Wales, Australia
Pseudocode	Yes	The ﬁnal pseudo code is provided in Alg.1. ... The ﬁnal pseudo code is provided in Alg.2. ... The ﬁnal pseudo code is provided in Alg.3.
Open Source Code	No	The paper does not provide explicit statements about source code availability or links to a code repository.
Open Datasets	Yes	The data of crimes in Vancouver1 comes from the Vancouver Open Data Catalogue. ... 1. https://www.kaggle.com/wosaku/crime-in-vancouver ... This data set2 includes all valid felony, misdemeanour and violation crimes reported to the New York police department (NYPD) for all complete quarters so far in 2017. ... 2. https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-YTD/5uac-w243
Dataset Splits	Yes	For Crime in Vancouver, the ﬁrst 519 data points are selected as training set to train the models with the rest being test data (time unit: days); for NYPD Complaint Data, the ﬁrst 324 data points are selected as training set with the rest being test (time unit: days).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'numerical packages' for optimization and integral calculations and implicitly Python as the programming language, but does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Throughout this work, the GP covariance kernel we use is the squared exponential kernel k(x, x ) = θ0 exp θ1 2 x x 2 . The hyperparameters θ0 and θ1 can be sampled by a Metropolis-Hasting method (Hastings, 1970). ... Therefore, we update them every 20 loops. Additional hyperparameters are the number and location of inducing points... For simulated data experiments, we use the thinning algorithm (Ogata, 1998) to generate 100 sets of training data and 10 sets of test data with Tφ = 6 and T = 100 in three cases... For the prediction task, we assume the top 17% of a sequence is observed (ϵ = 0.14 for Crime in Vancouver and 1 for NYPD Complaint Data where the choice of ϵ only aﬀects the absolute magnitude of prediction accuracy but not the relative magnitude, 400 samples for Monte Carlo integration).