reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Insights into Ordinal Embedding Algorithms: A Systematic Evaluation

Authors: Leena Chennuru Vankadara, Michael Lohaus, Siavash Haghiri, Faiz Ul Wahab, Ulrike von Luxburg

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our paper, we address these questions and provide an extensive and systematic empirical evaluation of existing algorithms as well as a new neural network approach. We ﬁnd that simple, relatively unknown, non-convex methods consistently outperform all other algorithms across a broad range of tasks including more recent and elaborate methods based on neural networks or landmark approaches.
Researcher Affiliation	Academia	Leena Chennuru Vankadara EMAIL University of T ubingen Michael Lohaus EMAIL University of T ubingen Siavash Haghiri EMAIL University of T ubingen Faiz Ul Wahab EMAIL University of T ubingen Ulrike von Luxburg EMAIL University of T ubingen and T ubingen AI Center
Pseudocode	No	The paper describes algorithms but does not include any explicit pseudocode blocks or algorithms labeled as such. For example, for LLOE, it states: "The full pseudo-code for the algorithm can be found in Anderton and Aslam (2019)." which refers to an external source.
Open Source Code	Yes	Private, anonymous link to the code:https://github.com/tml-tuebingen/evaluate-OE. Code will be made public on github after publication.
Open Datasets	Yes	To enable better visual evaluation of the embeddings generated by the various algorithms, we consider a variety of 2D datasets from Fr anti and Sieranoja (2018). For a thorough evaluation, we also consider a range of high-dimensional real (CHAR (Dua and Graﬀ, 2017), USPS (Hull, 1994), MNIST (Lecun and Cortes, 1998), Fashion MNIST (Xiao et al., 2017), KMNIST (Clanuwat et al., 2018), Forest Covertype (Dua and Graﬀ, 2017)) and synthetic datasets (Uniform, Mixture of Gaussians).
Dataset Splits	Yes	In order to capture how well the local structure is preserved, we train a k-nearest neighbour classiﬁer on 70% of the dataset and measure the error of the remaining 30% of the data. We ﬁx the parameter k = log n . To conduct this experiment, we choose the datasets USPS and CHAR. The USPS dataset has already a pre-deﬁned training and test set. In the case of CHAR, we create the test set by randomly splitting the data into a train (80%) and a test set (20%).
Hardware Specification	Yes	Hardware Setting 1 for most experiments: Intel Xeon Gold 6240 @ 2.60GHz and NVIDIA Ge Force RTX 2080-Ti 11 GB. Hardware Setting 2 for Covtype experiments: Intel Core Processors (Broadwell) @ 2GHz and NVIDIA Tesla V100 SXM2 32GB. Hardware Setting 3 for CPUvs GPU experiments: Intel Xeon E5-2650 v4 @ 2.2 GHz and NVIDIA GTX 1080-Ti 11 GB.
Software Dependencies	Yes	We implement all algorithms in Python 3.7 using the Py Torch library (Paszke et al., 2019), which allows us to use GPU resources for training.
Experiment Setup	Yes	We optimize all methods with a stochastic gradient descent using adaptive learning rates (Adam Kingma and Ba (2014)), with the exception of FORTE, which uses a line-search based on the implementation by Jain et al. (2016b). The parameter selection for each method is described in the supplementary (Section B). For all methods except OENN, we ﬁxed the batch size to min(#triplets, 1 million) and we performed a grid search for the learning rate and other parameters on 1000 points of uniform data. We shortly summarize the results here. Due to the more complex choice of hyperparameters for OENN, we describe them in detail in Section K.3. GNMDS. For GNMDS we use a regularization of λ = 0 and a learning rate of 10. FORTE. The learning rate is 100. Similar to the implementation of Jain et al. (2016b), we use the backtracking linear search parameter ρ = 0.5 and the Amarijo stopping condition parameter c1 = 0.0001 for the line-search optimization. CKL. For CKL we use a regularization of λ = 0 and a learning rate of 100. SOE, STE, t STE, and CKL x use a learning rate of 1. LOE. For LOE, the learning rate is 0.0001. LLOE. The phase one learning rate is 1, and the phase two learning rate is 0.5. We use Adam Kingma and Ba (2014) to train our network and use a batch size of min(# triplets, 50, 000) and a learning rate of 0.005 for all the experiments. Therefore, in all of our experiments, we set, w = max(120, 2d log n). In all the experiments, we set, α = log n .