reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ground Metric Learning

Authors: Marco Cuturi, David Avis

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We follow the presentation of our algorithms with promising experimental results which show that this approach is useful both for retrieval and binary/multiclass classiﬁcation tasks.
Researcher Affiliation	Academia	Marco Cuturi EMAIL David Avis EMAIL Graduate School of Informatics Kyoto University 36-1 Yoshida-Honmachi, Sakyo-ku Kyoto 606-8501, Japan
Pseudocode	Yes	Algorithm 1 Computation of z = S k (M) and a subgradient γ, where is either + or . Algorithm 2 Projected Subgradient Descent to minimize Ck Algorithm 3 Initial Point M0 to minimize Ck
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions using external tools like "CPLEX Matlab API implementation of network ﬂows" and "metric Nearness toolbox released online by Suvrit Sra" and an "implementation provided by the INRIA-LEAR team" but not its own code.
Open Datasets	Yes	We study in this section the performance of ground metric learning when coupled with a nearest neighbor classiﬁer on binary classiﬁcation tasks generated with the Caltech-256 database. 6 multiclass classiﬁcation data sets that consider text and image data. The properties of the data sets and parameters used in our experiments are summarized in Table 1. The dimensions of the features have been kept low to ensure that the computation of optimal transports are tractable. We follow the recommended train/test splits for these data sets. If they are not provided, we split the data sets arbitrarily to form features using either LDA (Blei et al. 2003) or SIFT features (Lowe 1999). Table 1: Multiclass classiﬁcation data sets and their parameters. 20 News Group Reuters MIT Scene UIUC Scene OXFORD Flower CALTECH-101
Dataset Splits	Yes	For each pair, we split the 80 + 80 available points into 30+30 points to train distance parameters and 50+50 points to form a test set. This amounts to having n = 60 training points following the notations introduced in Section 3.1.
Hardware Specification	Yes	The algorithm takes about 300 steps to converge (Figures 8 and 9), which, using a single Xeon 2.6Ghz core, 60 training points and d = 128 (the experimental setting considered below) takes about 900 seconds.
Software Dependencies	No	The paper mentions using "CPLEX Matlab API implementation of network ﬂows" and the "metric Nearness toolbox released online by Suvrit Sra" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The neighborhood parameter k is set to 3 to be directly comparable to the default parameter setting of ITML and LMNN. In each classiﬁcation task, and for two images ri and rj, the corresponding weight ωij is set to 1/nk if both histograms come from the same class and to 1/nk if they come from diﬀerent classes. The subgradient stepsize t0 of Algorithm 2 is set to = 0.1, guided by preliminary experiments and by the fact that, because of the normalization of the weights ωij, both the current iteration Mk in Algorithm 2 and subgradients γ+ or γ all have the same 1-norms. We carry out a minimum of 24 subgradient steps in each inner loop and set qmax to 80. Each inner loop is terminated when the objective does not progress more than 0.75% every 8 steps, or when q reaches qmax. We carry out a maximum of 20 outer loop iterations.