Optimal Transport: Fast Probabilistic Approximation with Exact Solvers
Authors: Max Sommerfeld, Jörn Schrieber, Yoav Zemel, Axel Munk
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical experiments that demonstrate that a very good approximation in typical applications can be obtained in a computation time that is several orders of magnitude smaller than what is required for exact computation of the full problem. |
| Researcher Affiliation | Academia | Max Sommerfeld EMAIL Felix-Bernstein Institute for Mathematical Statistics in the Biosciences University of G ottingen Goldschmidtstr. 7, 37077 G ottingen J orn Schrieber EMAIL Institute for Mathematical Stochastics University of G ottingen Goldschmidtstr. 7, 37077 G ottingen Yoav Zemel EMAIL Felix-Bernstein Institute for Mathematical Statistics in the Biosciences University of G ottingen Goldschmidtstr. 7, 37077 G ottingen Axel Munk EMAIL Institute for Mathematical Stochastics and Felix-Bernstein Institute for Mathematical Statistics in the Biosciences University of G ottingen Goldschmidtstr. 7, 37077 G ottingen and Max-Planck-Institute for Biophysical Chemistry Am Faßberg 11, 37077 G ottingen |
| Pseudocode | Yes | Algorithm 1 Statistical approximation of Wp(r, s) 1: Input: Probability measures r, s PX , sample size S and number of repetitions B 2: for i = 1 . . . B do 3: Sample i.i.d. X1, . . . , XS r and independently Y1, . . . , YS s 4: ˆr S,x # {k : Xk = x} /S for all x X 5: ˆs S,x # {k : Yk = x} /S for all x X 6: Compute ˆW (i) Wp(ˆr S, ˆs S) 7: end for 8: Return: ˆW (S) p (r, s) B 1 PB i=1 ˆW (i) |
| Open Source Code | No | The paper does not provide a direct link to a source-code repository or an explicit statement about releasing the code for the methodology described. It mentions using third-party R packages and solvers (transport, Rcplex, CPLEX) but not its own implementation. |
| Open Datasets | Yes | The instances of optimal transport considered here are discrete instances of two different types: regular grids in two dimensions, that is, images in various resolutions, as well as point clouds in [0, 1]D with dimensions D = 2, 3 and 4. For the image case, from the DOTmark, which contains images of various types intended to be used as optimal transport instances in the form of two-dimensional histograms, three instances were chosen: two images of each of the classes White Noise, Cauchy Density, and Classic Images, which are then treated in the three resolutions 32 32, 64 64 and 128 128. Images are interpreted as finitely supported measures. The mass of a pixel is given by the grayscale value and the support of the measure is the grid {1, . . . , R} {1, . . . , R} for an image with resolution R R. In the White Noise class the grayscale values of the pixels are independent of each other, the Cauchy Density images show bivariate Cauchy densities with random centers and varying scale ellipses, while Classic Images contains grayscale test images. See Schrieber et al. (2016) for further details on the different image classes and example images. |
| Dataset Splits | No | The paper describes selecting specific instances from the DOTmark benchmark and generating point cloud instances for evaluation, but it does not specify train/test/validation splits in the context of machine learning model training or evaluation. The experiments are focused on comparing approximation accuracy rather than training a model on a dataset with traditional splits. |
| Hardware Specification | Yes | One single core of a Linux server (AMD Opteron Processor 6140 from 2011 with 2.6 GHz) was used. |
| Software Dependencies | No | The transportation simplex and the shortlist method the implementations provided in the R package transport (Schuhmacher et al., 2014) were used. The models for the CPLEX solver were created and solved via the R package Rcplex (Bravo and Theussl, 2016). Additionally, the Sinkhorn scaling algorithm (Cuturi, 2013) was tested in our simulation. This method computes an entropy regularized optimal transport distance. No specific version numbers for the R packages or CPLEX are provided. |
| Experiment Setup | Yes | Algorithm 1 was applied to each of these instances with parameters S {100, 500, 1000, 2000, 4000} and B {1, 2, 5}. For every combination of instance and parameters, the subsampling algorithm was run 5 times in order to mitigate the randomness of the results. Since the linear programming solvers had a very similar performance on the grid based instances (see below), only one of them the transportation simplex was tested on the point cloud instances. All original instances were solved by each back-end solver in each resolution for the values p = 1, p = 2, and p = 3 in order to be compared to the approximative results for the subsamples in terms of runtime and accuracy, with the exception of CPLEX, where the 128 128 instances could not be solved due to memory limitations. |