reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Collaborative likelihood-ratio estimation over graphs

Authors: Alejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evaluation of the GRULSIF framework is conducted for the objective of estimating the likelihood-ratio rα v for each node of a given ﬁxed graph. In Sec. 6.1, we present synthetic experiments where the true likelihood-ratios are known by their design. The evaluation in real problems is challenging since the true likelihood-ratios are generally not known, however, in Sec. 6.2 we design a particular setting for seismic data where those true quantities can be safely assumed. In all experiments, both GRULSIF and POOL follow the numerical implementation guidelines described in Sec. 5, which include the CBCGD optimization technique and the Nystr om approximation of the RKHS.
Researcher Affiliation	Academia	Alejandro de la Concha alejandro.de la concha EMAIL Universit e Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, 91190 Gif-sur-Yvette, France Department of Mathematics, University of Luxembourg, 4364 Esch-sur-Alzette, Luxembourg Nicolas Vayatis EMAIL Universit e Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, 91190 Gif-sur-Yvette, France Argyris Kalogeratos EMAIL Universit e Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, 91190 Gif-sur-Yvette, France
Pseudocode	Yes	Algorithm 1 Model selection for GRULSIF hyperparameters tuning Algorithm 2 GRULSIF: Collaborative and distributed LRE over a graph Algorithm 3 Dictionary creation
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	The data is made public available by the Geo Net project that operates a geological hazard monitoring system in that territory. 1. Public data available by the Geo Net project GNS Science (1970): https://www.geonet.org.nz/earthquake/2021p405872 2. https://www.geonet.org.nz/earthquake/2023p741652 3. https://www.geonet.org.nz/earthquake/2024p817566
Dataset Splits	Yes	To compute this measure, we need both the approximated relative likelihood-ratio f and the true rα. The former is given by each LRE estimator trained on 80% of the observations, while by design we have rα v (x) = (1,...,1) RN for any α and x X. This choice is consistent with the sampling from pv and qv, which is done such that pv qv. The expected value Epα(y)[[fv rα v ]2(y)] is then calculated by averaging the estimation result on the remaining 20% of the observations that were not used during the training phase. Algorithm 1 Model selection for GRULSIF hyperparameters tuning: Randomly split X and X into R disjoint subsets {Xr}R r=1 and {X r}R r=1
Hardware Specification	Yes	On a single machine with 12th Gen Intel(R) Core(TM) i7-12700H processor and 16GB of RAM.
Software Dependencies	No	We apply a series of standard preprocessing steps used in seismology: the instrument response is deconvolved, the linear trend is removed, the observations are demeaned, a 2-20 bandpass ﬁlter is applied, and the ﬁltered data are downsampled by a factor of 5. To reduce the temporal dependency, we ﬁt an autoregressive model of order 1, and we keep the residuals to analyze further. The output is then standardized so that it has zero mean and unit variance. After completing these steps, we obtain 1200 observations at each location. We then assign the ﬁrst 600 observations to pv and the remaining 600 of them to qv. To account for spatial similarity, we generate an unweighted spatial graph GS = (V,E,W) where the nodes represent the seismic stations and the edges are computed in order to form a spatial 3-nearest neighbors graph, as visualized in Fig. 17.
Experiment Setup	Yes	To investigate the sensitivity of GRULSIF and POOL to the regularization parameter α, we complement the previous experiments by reporting in Fig. 7-11 results for α = {0.01,0.1,0.5}. We can see that tuning α aﬀects convergence, as suggested by Theorem 3. Low α values make the LRE task harder, hence leads to estimates with higher bias and variance, and slower convergence to the true target quantities (this is more evident in the box-plots). Moreover, graph regularization leads to more robust nodelevel estimates, i.e. lower variance within sets of connected nodes and faster convergence, especially for nodes where pv = qv. Finally, when α is closer to 1, GRULSIF and POOL get closer since the target relative likelihood-ratios become easier to estimate even without collaboration. The ﬁndings show that the Collaborative LRE is more robust as α gets closer to 0, when targeting a less regularized likelihood-ratio (see also Sec. 5.4).