Approximating 1-Wasserstein Distance with Trees

Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT. ... We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets. ... Tables 1,2, and 3 present the experimental results for the Twitter, BBCSport, and Amazon datasets, respectively. Evidently, the q TWD and c TWD can obtain a small MAE and accurately approximate the original 1-Wasserstein distance.
Researcher Affiliation Collaboration Makoto Yamada EMAIL Okinawa Institute of Science and Technology Kyoto University RIKEN AIP ... Zornitsa Kozareva EMAIL Meta AI ... Sujith Ravi EMAIL Slice X AI
Pseudocode Yes Algorithm 1 Sliced weight estimation with trees
Open Source Code Yes Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT.
Open Datasets Yes We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets 2. ... 2https://github.com/gaohuang/S-WMD
Dataset Splits Yes We evaluated the document classification experiment tasks with the 1-Wasserstein distance (Sinkhorn algorithm), Quad Tree, Cluster Tree, q TWD, c TWD, and their sliced counterparts. For the proposed method, we set the regularization parameter to λ = 10 3. In this experiment, we used 90% of the samples for training and the remaining samples for the test. We then used the ten nearest neighbor classification methods.
Hardware Specification Yes We evaluated all the methods using Xeon CPU E5-2690 v4 (2.60 GHz) and Xeon CPU E7-8890 v4 (2.20 GHz). ... For training, we measured the average computational cost using a Xeon CPU E5-2690 v4 (2.60 GHz) ... For the test, we ran all the methods with an A6000 GPU.
Software Dependencies No For the 1-Wasserstein distance, we used the Python optimal transport (POT) package 1. ... We used SPAMS to solve the Lasso problem 3. ... 1https://pythonot.github.io/index.html ... 3http://thoth.inrialpes.fr/people/mairal/spams/
Experiment Setup Yes For the Cluster Tree, we set the number of clusters K = 5 for all experiments. ... For the proposed methods, we selected the regularization parameter from {10 3, 10 2, 10 1}. ... In this experiment, we set the number of slices to T = 3 ... For the Sinkhorn algorithm, we set the regularization parameter with 10 2 and the maximum iteration with 100, respectively. ... For the proposed method, we set the regularization parameter as λ = 10 3. ... We then used the ten nearest neighbor classification methods.