reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

String and Membrane Gaussian Processes

Authors: Yves-Laurent Kom Samo, Stephen J. Roberts

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the efﬁcacy of the proposed approach on several synthetic and real-world data sets, including a data set with 6 millions input points and 8 attributes. ... 6. Experiments We now move on to presenting empirical evidence for the efﬁcacy of string GPs in coping with local patterns in data sets, and in doing so in a scalable manner. Firstly we consider maximum marginal likelihood inference on two small scale problems exhibiting local patterns. ... Finally, we illustrate the performance of the previously derived MCMC sampler on two large scale Bayesian inference problems, namely the prediction of U.S. commercial airline arrival delays of Hensman et al. (2013) and a new large scale dynamic asset allocation problem.
Researcher Affiliation	Academia	Yves-Laurent Kom Samo EMAIL Stephen J. Roberts EMAIL Department of Engineering Science and Oxford-Man Institute University of Oxford Eagle House, Walton Well Road, OX2 6ED, Oxford, United Kingdom
Pseudocode	Yes	Algorithm 1 Simulation of a string derivative Gaussian process ... Algorithm 2 MCMC sampler for nonparametric Bayesian inference of a real-valued latent function under a string GP prior
Open Source Code	No	No explicit statement by the authors about releasing their own source code or a link to a repository for the methodology described in this paper is present. The mention of 'The GPy authors (2012 2016)' refers to a third-party library used, not the authors' own implementation.
Open Datasets	Yes	We illustrate the efﬁcacy of the proposed approach on several synthetic and real-world data sets, including a data set with 6 millions input points and 8 attributes. ... Using the motorcycle data set of Silverman (1985), commonly used for the local patterns and heteroskedasticity it exhibits, we show that our approach outperforms the aforementioned competing alternatives... To illustrate how our approach fares against competing alternatives on a standard large scale problem, we consider predicting arrival delays of commercial ﬂights in the USA in 2008 as studied by Hensman et al. (2013). ... The data originate from the CRSP and Compustat databases.
Dataset Splits	Yes	For each data set, we use 2/3 of the records selected uniformly at random for training and we use the remaining 1/3 for testing. In our second experiment, we consider illustrating the advantage of the string GP paradigm over the standard GP paradigm... We ran 50 independent random experiments, leaving out 5 points selected uniformly at random from the data set for prediction, the rest being used for training.
Hardware Specification	No	The paper mentions 'on our 8 cores machine' but does not provide specific hardware details such as the CPU model, GPU model, or other specific components required for reproducibility.
Software Dependencies	No	The paper mentions 'For SVIGP we use the implementation made available by the The GPy authors (2012 2016)', which names a software package (GPy) but provides a range of years rather than a specific version number. It does not list any other software components or libraries with specific version numbers.
Experiment Setup	Yes	For the BCM and r BCM, the number of experts is chosen so that each expert processes 200 training points. ... The parameters α and β are chosen so that the prior mean number of change-points in each input dimension is 5% of the number of distinct training and testing values in that input dimension, and so that the prior variance of the foregoing number of change-points is 50 times the prior mean the aim is to be uninformative about the number of change-points. We run 10, 000 iterations of our RJ-MCMC sampler and discarded the ﬁrst 5, 000 as burn-in . After burn-in we record the states of the Markov chains for analysis using a 1-in-100 thinning rate. ... We chose αe and βe in Equation (54) so that the mean of the Gamma distribution is 10.0 and its variance 0.5, which expresses a very greedy investment target.