Approximating 1-Wasserstein Distance with Trees
Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT. ... We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets. ... Tables 1,2, and 3 present the experimental results for the Twitter, BBCSport, and Amazon datasets, respectively. Evidently, the q TWD and c TWD can obtain a small MAE and accurately approximate the original 1-Wasserstein distance. |
| Researcher Affiliation | Collaboration | Makoto Yamada EMAIL Okinawa Institute of Science and Technology Kyoto University RIKEN AIP ... Zornitsa Kozareva EMAIL Meta AI ... Sujith Ravi EMAIL Slice X AI |
| Pseudocode | Yes | Algorithm 1 Sliced weight estimation with trees |
| Open Source Code | Yes | Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT. |
| Open Datasets | Yes | We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets 2. ... 2https://github.com/gaohuang/S-WMD |
| Dataset Splits | Yes | We evaluated the document classification experiment tasks with the 1-Wasserstein distance (Sinkhorn algorithm), Quad Tree, Cluster Tree, q TWD, c TWD, and their sliced counterparts. For the proposed method, we set the regularization parameter to λ = 10 3. In this experiment, we used 90% of the samples for training and the remaining samples for the test. We then used the ten nearest neighbor classification methods. |
| Hardware Specification | Yes | We evaluated all the methods using Xeon CPU E5-2690 v4 (2.60 GHz) and Xeon CPU E7-8890 v4 (2.20 GHz). ... For training, we measured the average computational cost using a Xeon CPU E5-2690 v4 (2.60 GHz) ... For the test, we ran all the methods with an A6000 GPU. |
| Software Dependencies | No | For the 1-Wasserstein distance, we used the Python optimal transport (POT) package 1. ... We used SPAMS to solve the Lasso problem 3. ... 1https://pythonot.github.io/index.html ... 3http://thoth.inrialpes.fr/people/mairal/spams/ |
| Experiment Setup | Yes | For the Cluster Tree, we set the number of clusters K = 5 for all experiments. ... For the proposed methods, we selected the regularization parameter from {10 3, 10 2, 10 1}. ... In this experiment, we set the number of slices to T = 3 ... For the Sinkhorn algorithm, we set the regularization parameter with 10 2 and the maximum iteration with 100, respectively. ... For the proposed method, we set the regularization parameter as λ = 10 3. ... We then used the ten nearest neighbor classification methods. |