Multiscale Strategies for Computing Optimal Transport

Authors: Samuel Gerber, Mauro Maggioni

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical evidence that this multiscale approach scales approximately linearly, in time and memory, in the number of nodes, instead of quadratically or worse for a direct solution. Empirically, the multiscale approach results in less than one percent relative error in the objective function. Furthermore, the multiscale plans constructed are of interest by themselves as they may be used to introduce novel features and notions of distances between point sets. An analysis of sets of brain MRI based on optimal transport distances illustrates the effectiveness of the proposed method on a real world data set.
Researcher Affiliation Collaboration Samuel Gerber1 EMAIL Mauro Maggioni2,3,4 EMAIL 1Kitware, NC, U.S.A Department of 2Mathematics, 3Applied Mathematics, 4Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, U.S.A.
Pseudocode Yes Algorithm 1: Multiscale Discrete Optimal Transport. Algorithm 2: Point Set based Multiscale Optimal Transport Implementation
Open Source Code Yes The R package mop provides a flexible implementation of the multiscale framework and is adaptable to any multiscale decomposition that can be represented by a tree. ... 1. available on https://bitbucket.org/suppechasper/optimaltransport
Open Datasets Yes For the comparisons we used the OASIS brain database (HHMI et al.). The OASIS database consists of T1 weighted MRI of 416 subjects aged 18 to 96. One hundred of the subjects over the age of 60 are diagnosed with mild to moderate dementia. ... Open access series of imaging studies. http://www.oasis-brains.org.
Dataset Splits No The paper mentions downsampling the volumes to size '44 52 44' and discarding 'zero intensity voxels' for brain MRI data. It also describes building linear regression models and MDS embeddings. However, it does not provide explicit details on how the datasets were split into training, validation, or test sets for these models, or any specific cross-validation strategy.
Hardware Specification No The paper states: 'The network simplex and Sinkhorn approach run out of memory on a 16GB computer for problems larger than around 2 104 constraints and about 108 variables.' While it mentions '16GB' of memory, it does not specify any particular CPU or GPU model, or other detailed computer specifications for the experiments conducted.
Software Dependencies No The paper mentions several software tools and libraries: 'R package mop', 'R front end in the package sinkhorn', 'Eigen linear algebra library', 'GLPK (Makhorin, 2013)', 'MOSEK (Andersen and Andersen, 2013)', and 'CPLEX (IBM, 2013)'. However, it does not provide specific version numbers for any of these software components, which are required for a reproducible description.
Experiment Setup Yes For the examples shown we select K = 2d with d the dimensionality of the data set. ... In these experiments we set the tolerance parameter for Sinkhorn to 1e 5, and tried also 1e 2 to see if this would result in significant speed ups, with minimal accuracy loss, but it resulted in only approximately a 10% speedup, and no fundamental change in the behavior for large problem sizes. The tolerance parameter for CPLEX is set to 1e 8.