Distance-Based Tree-Sliced Wasserstein Distance

Authors: Viet-Hoang Tran, Minh-Khoi Nguyen-Nhat, Trang Pham, Thanh Chu, Tam Le, Tan Nguyen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that Db-TSW significantly improves accuracy compared to recent SW variants while maintaining low computational cost via a wide range of experiments on gradient flows, image style transfer, and generative models. The code is publicly available at https://github.com/Fsoft-AIC/Db TSW.
Researcher Affiliation Collaboration Viet-Hoang Tran Department of Mathematics National University of Singapore EMAIL Khoi N.M. Nguyen FPT Software AI Center EMAIL Trang Pham Qualcomm AI Research EMAIL Thanh T. Chu Department of Computer Science National University of Singapore EMAIL Tam Le The Institute of Statistical Mathematics & RIKEN AIP EMAIL Tan M. Nguyen Department of Mathematics National University of Singapore EMAIL
Pseudocode Yes Algorithm 1 Distance-based Tree-Sliced Wasserstein distance. Input: Probability measures µ and ν in P(Rd), number of tree systems L, number of lines in tree system k, space of tree systems T or T , splitting maps α with tuning parameter δ R. for i = 1 to L do Sampling x Rd and θ1, . . . , θk i.i.d U(Sd 1). if space of tree system is T then Orthonormalize θ1, . . . , θk. end if Contruct tree system Li = {(x, θ1), . . . , (x, θk)}. Projecting µ and ν onto Li to get Rα Liµ and Rα Liν. Compute \ Db-TSW(µ, ν) = (1/L) Wd Li,1(Rα Liµ, Rα Liν). end for Return: \ Db-TSW(µ, ν).
Open Source Code Yes The code is publicly available at https://github.com/Fsoft-AIC/Db TSW.
Open Datasets Yes All datasets that we used in the paper are published, and they are easy to access in the Internet. Table 1: Results of different DDGAN variants for unconditional generation on CIFAR-10. All models undergo training for 1800 epochs on the CIFAR10 dataset (Krizhevsky et al., 2009). We evaluate our approach Db-TSW , Db-TSW and several established techniques, including vanilla SW Bonneel et al. (2015), Max SW (Deshpande et al., 2019), LCVSW (Nguyen & Ho, 2024), SWGG (Mahey et al., 2023), and TSW-SL (Tran et al., 2025d), using the Swiss Roll and Gaussian 20d datasets. Table 3: Average Wasserstein distance between source and target distributions of 10 runs on 25 Gaussians dataset.
Dataset Splits No The paper mentions using well-known datasets like CIFAR-10 and other synthetic datasets. While it indicates training on CIFAR-10 for 1800 epochs, it does not explicitly provide specific split percentages, sample counts, or citations to predefined splits for reproduction purposes. For example, it does not state "80/10/10 split" or refer to a "standard train/test split from [Author et al.]".
Hardware Specification Yes Table 1 presents the Fr echet Inception Distance (FID) scores and per-epoch training times on an Nvidia V100 GPU for our methods and the baselines. The experiments of gradient flow and color transfer were carried out on a single NVIDIA A100 GPU. In contrast, the denoising diffusion experiments were executed in parallel on two NVIDIA A100 GPUs, with each run lasting around 81.5 hours.
Software Dependencies No The paper mentions "torch.compile (Torch document)" which implies the use of PyTorch. However, it does not provide specific version numbers for PyTorch, Python, or any other critical libraries used in the implementation.
Experiment Setup Yes For all our methods, including Db-TSW-DD and Db-TSW-DD, we use L = 2500, k = 4, δ = 10. For vanilla SW and SW variants, we follow Nguyen et al. (2024b) to use L = 10000. We follow Mahey et al. (2023) to set the global learning rate for all baselines to be 5e-3 for all datasets. For our methods, we use the global learning rate of 5e-3 for 25 Gaussians and Swiss Roll datasets and 5e-2 for Gaussian 20d. The total number of iterations for all methodologies is standardized at 2000. A step size of 1 is utilized for all baseline methods, while a step size of 17 is adopted for TSW-SL, Db-TSW-SL , and Db-TSW-SL. For the sliced Wasserstein variants, we set L = 100, whereas in TSW-SL and our proposed methods, we utilize L = 33 and k = 3 to ensure a consistent and fair comparison.