Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Uncovering Sets of Maximum Dissimilarity on Random Process Data
Authors: Miguel de Carvalho, Gabriel Martos
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The numerical studies validate the proposed methods, and we showcase their application with case studies on criminology, finance, and medicine. In Appendix C we assess the performance of the proposed tools via a Monte Carlo study. Next, we showcase real data applications. |
| Researcher Affiliation | Academia | Miguel de Carvalho EMAIL School of Mathematics, University of Edinburgh, UK CIDMA, Universidade de Aveiro, Portugal. Gabriel Martos EMAIL Departamento de Matemática y Estadística, Universidad Torcuato Di Tella, Argentina. |
| Pseudocode | Yes | Algorithm 1 INLA-based posterior inference for BMDs |
| Open Source Code | No | The paper mentions using the R-INLA package (Martins et al., 2013; Lindgren et al., 2015) and R (R Development Core Team, 2022), which are third-party tools. There is no explicit statement or direct link provided for the source code of the methodology developed in this paper. |
| Open Datasets | Yes | The data are publicly available online in the city hall web page (https://data.buenosaires.gob.ar), and consist of point process data on the latitude and longitude where thefts occurred during 2019 (D2019) and 2020 (D2020). Data were gathered from Yahoo Finance and consist of monthly values of the NYSE and NASDAQ composite indices. The data are publicly available from the UCR Time Series Classification and Clustering website (http://www.cs.ucr.edu/ eamonn/time_series_data_2018/). |
| Dataset Splits | No | The paper mentions using data from different years for comparison (2019 vs 2020 for thefts) and specific time ranges for stock market data, and lists the number of normal vs. myocardial infarction signals (133 vs. 67) for the ECG200 dataset. However, it does not explicitly describe any training, validation, or test dataset splits or percentages required to reproduce experimental setups. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware (like CPU or GPU models) used for running the experiments. |
| Software Dependencies | Yes | to facilitate its implementation we recommend using the R-INLA package (Martins et al., 2013; Lindgren et al., 2015) from R (R Development Core Team, 2022). In Scenario 2 we use a numerical approximation of DH(B c , b B c ) implemented using Borchers (2021). The bibliography entry for Borchers (2021) states: 'HANS W. BORCHERS. pracma: Practical Numerical Math Functions, 2021. R package version 2.3.6.' |
| Experiment Setup | Yes | for Scenario 1 the identity link function was used in (8) along with B-spline basis, and the number of basis functions was selected using the DIC. The default uninformative priors of R-INLA have been used, which consist of diffuse priors for the β s i.e. β0 N(0, ) and βi N(0, 1000) and a long-tailed prior for the variance of the error term i.e. a log gamma distribution, where the gamma distribution has mean a/b and variance a/b2, with a = 1 and b = 10 5. For Scenario 2 we follow Simpson et al. (2016) and specify a log-Gaussian Cox process using (8) by setting a log link function, that links the intensity function with a Matérn random field using piecewise linear basis functions over a mesh, and where the β s are Gaussian-distributed. For the parameters of the Matérn covariance function we use the PC prior approach of Fuglstad et al. (2019) setting P(σ > 1) = 0.001 and P(ℓ< 0.05) = 0.001. In all experiments reported below, we draw m = 1 000 times from the posterior distribution of BMDs using Algorithm 1. |