reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Graph Cut Kernel for Ranked Data

Authors: Michelangelo Conserva, Marc Peter Deisenroth, K S Sesh Kumar

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our novel kernel drastically reduces the computational cost while maintaining the same accuracy as state-of-the-art methods. ... The goal of this section is to demonstrate the (i) scalability, (ii) accuracy and (iii) flexibility of the graph cut kernel. We use a downstream classification task which is a traditional testbed for kernel-based methods for rankings (Jiao & Vert, 2015; Lomeli et al., 2019). ... Figure 4 reports the F1-scores on the test sets obtained by the GP classifiers for the different kernels on increasing levels of noise.
Researcher Affiliation	Academia	Michelangelo Conserva EMAIL Department of EECS Queen Mary University of London Marc Peter Deisenroth EMAIL UCL Centre for Artiﬁcial Intelligence University College London K S Sesh Kumar EMAIL Data Science Institute Imperial College London
Pseudocode	Yes	Algorithm 1 PAVA(A, F)
Open Source Code	Yes	The code is available at https://github.com/MichelangeloConserva/CutFunctionKernel/.
Open Datasets	Yes	Real-world dataset. The state-of-the-art dataset for rankings data is the sushi dataset (Kamishima, 2003).
Dataset Splits	Yes	Samples are split into train 80% and test sets 20%. Results are reported in terms of F1-scores (Chinchor, 1992), which provides a reliable estimate of the actual capabilities of the model as it penalizes the classifier for false positives and false negatives. ... For the synthetic dataset, we choose a sample size of 250 for full and top-k rankings and, due to the computational cost of the Kendall kernel, 100 for exhaustive interleaving rankings. For both the small and the large sushi datasets, we use 2500 rankings.
Hardware Specification	Yes	The values for the empirical time complexity are averaged over seven trials on an Intel Core i7-7700HQ CPU @ 2.80GHz. ... We found that the time required to calculate the feature maps is 6.31s ± 88.6ms, while the time required to compute the Gram matrix from the feature maps is 576ms ± 19.7ms with hardware acceleration using a single Nvidia Ge Force GTX 1060 GPU.
Software Dependencies	No	To train the GP classifier we use the python package GPﬂow (Matthews et al., 2017), which we extended with custom classes. ... The accessibility of hardware acceleration for the graph cut kernel is easily accessible using Python packages such as GPFlow.
Experiment Setup	Yes	To train the GP classifier we use the python package GPﬂow (Matthews et al., 2017), which we extended with custom classes. The cut function is applied to a graph built from the information on the objects, i.e., the dishes in the synthetic dataset and the sushis in the real dataset. The edges of such graph are calculated using the squared exponential kernel as the similarity score function... The lengthscale ℓis selected using the inverse median heuristic (Schölkopf & Smola, 2002). For the large sushi dataset, we use 600 samples from the set of exhaustive ordered partitions R and we retain the top 10% edges of the graph. ... To create a more challenging classification task, it is possible to inject Gaussian noise with σ standard deviation to the scores...