Universal Neural Optimal Transport

Authors: Jonathan Geuter, Gregor Kornhardt, Ingimar Tomasson, Vaios Laschos

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on Euclidean and non-Euclidean domains, we show that our network not only accurately predicts OT distances and plans across a wide range of datasets, but also captures the geometry of the Wasserstein space correctly. Furthermore, we show that our network can be used as a stateof-the-art initialization for the Sinkhorn algorithm with speedups of up to 7.4 , significantly outperforming existing approaches. ... We demonstrate the performance of the three models on various tasks, such as predicting transport distances, initializing the Sinkhorn algorithm, computing Sinkhorn divergence barycenters, and approximating Wasserstein geodesics.
Researcher Affiliation Academia 1Harvard John A. Paulson School of Engineering and Applied Sciences 2Kempner Institute at Harvard University 3Department of Mathematics, Technische Universität Berlin, Germany 4Weierstrass Institute, Berlin, Germany.
Pseudocode Yes Algorithm 1 Sinkhorn(µ, ν > 0, K = exp( C/ϵ), ϵ, v0) ... Algorithm 3 Barycenter Computation
Open Source Code Yes The implementation and model weights are available at https: //github.com/Gregor Kornhardt/UNOT.
Open Datasets Yes For the Euclidean settings a) and b) (from above), we view images as discrete distributions on the unit square, and test on MNIST (28 28), grayscale CIFAR10 (28 28), the teddy bear class from the Google Quick, Draw! dataset (64 64), and Labeled Faces in the Wild (LFW, 64 64), as well as cross-datasets CIFAR-MNIST and LFW-Bear (where µ comes from one dataset and ν from the other). ... For some of the experiments in the appendix, we included two additional datasets, the cars class which is also from the Quick, Draw! dataset, and the Facial Expressions dataset (Hashan, 2022)
Dataset Splits No Errors are averaged over 500 samples. This indicates a test set size but does not provide specific training, validation, or overall dataset split information needed for reproduction.
Hardware Specification Yes Training takes around 35h on an H100 GPU. ... We achieve an average speedup of 3.57 on 28 28 datasets and 4.4 on 64 64 datasets. ... on a batch size of 64 in float32 on an NVIDIA 4090
Software Dependencies No In Table 2 we show the relative speedup achieved by initializing the Sinkhorn algorithm with UNOT implemented in JAX over the default initialization... FNOs process complex numbers, but Py Torch is heavily optimized for real number operations. The paper mentions JAX and PyTorch but does not specify their version numbers.
Experiment Setup Yes Hyperparameters. In Table 3 we present all relevant hyperparameters again for convenience. ... Table 3. Training hyperparameters. Hyperparameter Value # params Gθ 272k # layers Gϕ 5 hidden dims Gϕ (164, 164, 164, 164) δ (eq. (6)) 1e-6 λ (eq. (6)) 1 d (dimension of latent z) 2 * 10 * 10 = 200 optimizer Gϕ Adam activations Gϕ ELU β1 (initial learning rate Gθ ) 0.001 learning rate decay Gθ 1 weight decay Gϕ 0 # params Sϕ 26M Number of Fourier layers 4 dvi (dim. in Fourier blocks) 64 hidden dim. of Fourier NN 256 # layers in Fourier NN 2 Nmodesx (# Fourier modes) 10 Nmodesy (# Fourier modes) 10 optimizer Sϕ Adam W σ (activation in Sϕ) Ge LU α1 (initial learning rate Sϕ) 1e-4 learning rate decay Sϕ 0.9999 weight decay Sθ 1e-4 minimum training sample size 10 * 10 maximum training sample size 64 * 64 # training samples 200M batch size 5000 mini batch size 64 T (number batches) 40k ϵ (for Sinkhorn targets) 0.01 k (# Sinkhorn iterations for targets) 5