Improving Neural Optimal Transport via Displacement Interpolation

Authors: Jaemoo Choi, Yongxin Chen, Jaewoong Choi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that DIOTM achieves more stable convergence and superior accuracy in approximating OT maps compared to existing methods. In particular, DIOTM achieves competitive FID scores in image-to-image translation tasks, such as 5.27 for Male Female (64 64), 7.40 for Male Female (128 128), and 10.72 for Wild Cat (64 64), comparable to the state-of-the-art results. Our contributions can be summarized as follows: We propose a method to learn the optimal transport map based on displacement interpolation. We derive the dual formulation of displacement interpolation and utilize this to formulate a max-min optimization problem for the transport map and potential function. We introduce a novel regularizer, called the HJB regularizer, derived from the optimality condition of the potential function. Our model significantly improves the training stability and accuracy of existing OT Map models that leverage min-max objectives.
Researcher Affiliation Academia Jaemoo Choi Georgia Institute of Technology EMAIL Yongxin Chen Georgia Institute of Technology EMAIL Jaewoong Choi Sungkyunkwan University EMAIL
Pseudocode Yes Algorithm 1 Training algorithm of DIOTM
Open Source Code Yes To ensure the reproducibility of our work, we submitted the anonymized source in the supplementary material, provided complete proofs of our theoretical results in Appendix A, and included the implementation and experiment details in Appendix B.
Open Datasets Yes We assessed our model on several Image-to-Image (I2I) translation benchmarks: Male Female (Liu et al., 2015) (64 64), Wild Cat (Choi et al., 2020) (64 64), and Male Female (Liu et al., 2015) (128 128).
Dataset Splits Yes In the Wild Cat experiments, we generated ten samples for each source test image. Since the source test dataset consists of approximately 500 samples, we generated 5000 generated samples. Then, we computed the FID score with the training target dataset, which also contains 5000 samples. Also, in the Celeb A experiment, we computed the FID score using the test target dataset, which includes 12247 samples. We generated the same 12247 samples and compared them with the test target dataset.
Hardware Specification No We thank the Center for Advanced Computation in KIAS for providing computing resources.
Software Dependencies No We used the POT library Flamary et al. (2021) to obtain an accurate transport plan πpot. We used 1000 training samples for each dataset in estimating πpot to sufficiently reduce the gap between the true continuous measure and the empirical measure.
Experiment Setup Yes Training Hyperparameters We set Adam optimizer with (β1, β2) = (0, 0.9), learning rate of 10 4 and the number of iteration of 120K. We set α = 0.1 and λ = 1. Training Hyperparameters We follow the large neural network architecture introduced in Xiao et al. (2021). We use Adam optimizer with (β1, β2) = (0, 0.9), learning rate of 10 4, and trained for 60K iterations. We use a cosine scheduler to gradually decrease the learning rate from 10 4 to 5 10 5. The batch size of 64 and 32 is employed for 64 64 and 128 128 image datasets, respectively. We use α = 0.001 for Celeb A image dataset, and α = 0.0005 for AFHQ dataset. We use ema rate of 0.9999 for 64 64 image datasets and 0.999 for 128x128 image datasets.