Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein
Authors: Hugues Van Assel, Cédric Vincent-Cuaz, Nicolas Courty, Rémi Flamary, Pascal Frossard, Titouan Vayer
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets. |
| Researcher Affiliation | Academia | Hugues Van Assel EMAIL ENS de Lyon, CNRS, UMPA UMR 5669 Cédric Vincent-Cuaz cedric.vincent-cuaz@epfl.ch EPFL, LTS4 Nicolas Courty EMAIL Université Bretagne Sud, IRISA UMR 6074 Rémi Flamary remi.flamary@polytechnique.edu École polytechnique, Institut Polytechnique de Paris, CMAP UMR 7641 Pascal Frossard pascal.frossard@epfl.ch EPFL, LTS4 Titouan Vayer EMAIL Inria, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1, LIP UMR 5668 |
| Pseudocode | Yes | Algorithm 1 CG solver for sr GWL 2: F(i) Compute gradient w.r.t T of equation 76. 3: X(i) min X1m=h X 0 X, F(i) 4: T(i+1) (1 γ )T(i) + γ X(i) with γ [0, 1] from exact-line search. 5: until convergence. |
| Open Source Code | Yes | Code is provided at https://github.com/huguesva/Distributional-Reduction. |
| Open Datasets | Yes | over 8 labeled datasets detailed in Appendix F including: 3 image datasets (COIL-20 Nene et al. 1996, MNIST & fashion-MNIST Xiao et al. 2017) and 5 genomic ones (PBMC Wolf et al. 2018, SNA 1 & 2 Chen et al. 2019 and ZEISEL 1 & 2Zeisel et al. 2015). |
| Dataset Splits | No | The paper mentions using well-known datasets like MNIST and Fashion-MNIST, which often have standard splits. However, it does not explicitly state the training, validation, and test splits used for its experiments (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | Yes | All experiments were done on a server using a GPU (Tesla V100-SXM2-32GB) and composed of 18 cores Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz. |
| Software Dependencies | No | The paper mentions several software components like PyTorch, POT, scikit-learn, Torchmetrics, Geoopt, and Adam/RAdam optimizers, but it does not provide specific version numbers for these software dependencies. For example, it cites PyTorch (Paszke et al., 2017) but doesn't specify which version was used. |
| Experiment Setup | Yes | For the SEA and UMAP based similarities, we validated perplexity across the set {20, 50, 100, 150, 200, 250}. For all kernels, the number of output samples n spans a set of 10 values, starting at the number of classes in the data and incrementing in steps of 20. For the computation of T in Dist R (see Section 4.1), we benchmark our Conditional Gradient solver, and the Mirror Descent algorithm whose hyperparameter ε is validated in the two first values within the set {10i}3 i= 3 leading to stable optimization. |