reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiscale Strategies for Computing Optimal Transport

Authors: Samuel Gerber, Mauro Maggioni

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide numerical evidence that this multiscale approach scales approximately linearly, in time and memory, in the number of nodes, instead of quadratically or worse for a direct solution. Empirically, the multiscale approach results in less than one percent relative error in the objective function. Furthermore, the multiscale plans constructed are of interest by themselves as they may be used to introduce novel features and notions of distances between point sets. An analysis of sets of brain MRI based on optimal transport distances illustrates the eﬀectiveness of the proposed method on a real world data set.
Researcher Affiliation	Collaboration	Samuel Gerber1 EMAIL Mauro Maggioni2,3,4 EMAIL 1Kitware, NC, U.S.A Department of 2Mathematics, 3Applied Mathematics, 4Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, U.S.A.
Pseudocode	Yes	Algorithm 1: Multiscale Discrete Optimal Transport. Algorithm 2: Point Set based Multiscale Optimal Transport Implementation
Open Source Code	Yes	The R package mop provides a ﬂexible implementation of the multiscale framework and is adaptable to any multiscale decomposition that can be represented by a tree. ... 1. available on https://bitbucket.org/suppechasper/optimaltransport
Open Datasets	Yes	For the comparisons we used the OASIS brain database (HHMI et al.). The OASIS database consists of T1 weighted MRI of 416 subjects aged 18 to 96. One hundred of the subjects over the age of 60 are diagnosed with mild to moderate dementia. ... Open access series of imaging studies. http://www.oasis-brains.org.
Dataset Splits	No	The paper mentions downsampling the volumes to size '44 52 44' and discarding 'zero intensity voxels' for brain MRI data. It also describes building linear regression models and MDS embeddings. However, it does not provide explicit details on how the datasets were split into training, validation, or test sets for these models, or any specific cross-validation strategy.
Hardware Specification	No	The paper states: 'The network simplex and Sinkhorn approach run out of memory on a 16GB computer for problems larger than around 2 104 constraints and about 108 variables.' While it mentions '16GB' of memory, it does not specify any particular CPU or GPU model, or other detailed computer specifications for the experiments conducted.
Software Dependencies	No	The paper mentions several software tools and libraries: 'R package mop', 'R front end in the package sinkhorn', 'Eigen linear algebra library', 'GLPK (Makhorin, 2013)', 'MOSEK (Andersen and Andersen, 2013)', and 'CPLEX (IBM, 2013)'. However, it does not provide specific version numbers for any of these software components, which are required for a reproducible description.
Experiment Setup	Yes	For the examples shown we select K = 2d with d the dimensionality of the data set. ... In these experiments we set the tolerance parameter for Sinkhorn to 1e 5, and tried also 1e 2 to see if this would result in signiﬁcant speed ups, with minimal accuracy loss, but it resulted in only approximately a 10% speedup, and no fundamental change in the behavior for large problem sizes. The tolerance parameter for CPLEX is set to 1e 8.