Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein

Authors: Hugues Van Assel, Cédric Vincent-Cuaz, Nicolas Courty, Rémi Flamary, Pascal Frossard, Titouan Vayer

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets.
Researcher Affiliation Academia Hugues Van Assel EMAIL ENS de Lyon, CNRS, UMPA UMR 5669 Cédric Vincent-Cuaz cedric.vincent-cuaz@epfl.ch EPFL, LTS4 Nicolas Courty EMAIL Université Bretagne Sud, IRISA UMR 6074 Rémi Flamary remi.flamary@polytechnique.edu École polytechnique, Institut Polytechnique de Paris, CMAP UMR 7641 Pascal Frossard pascal.frossard@epfl.ch EPFL, LTS4 Titouan Vayer EMAIL Inria, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1, LIP UMR 5668
Pseudocode Yes Algorithm 1 CG solver for sr GWL 2: F(i) Compute gradient w.r.t T of equation 76. 3: X(i) min X1m=h X 0 X, F(i) 4: T(i+1) (1 γ )T(i) + γ X(i) with γ [0, 1] from exact-line search. 5: until convergence.
Open Source Code Yes Code is provided at https://github.com/huguesva/Distributional-Reduction.
Open Datasets Yes over 8 labeled datasets detailed in Appendix F including: 3 image datasets (COIL-20 Nene et al. 1996, MNIST & fashion-MNIST Xiao et al. 2017) and 5 genomic ones (PBMC Wolf et al. 2018, SNA 1 & 2 Chen et al. 2019 and ZEISEL 1 & 2Zeisel et al. 2015).
Dataset Splits No The paper mentions using well-known datasets like MNIST and Fashion-MNIST, which often have standard splits. However, it does not explicitly state the training, validation, and test splits used for its experiments (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification Yes All experiments were done on a server using a GPU (Tesla V100-SXM2-32GB) and composed of 18 cores Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz.
Software Dependencies No The paper mentions several software components like PyTorch, POT, scikit-learn, Torchmetrics, Geoopt, and Adam/RAdam optimizers, but it does not provide specific version numbers for these software dependencies. For example, it cites PyTorch (Paszke et al., 2017) but doesn't specify which version was used.
Experiment Setup Yes For the SEA and UMAP based similarities, we validated perplexity across the set {20, 50, 100, 150, 200, 250}. For all kernels, the number of output samples n spans a set of 10 values, starting at the number of classes in the data and incrementing in steps of 20. For the computation of T in Dist R (see Section 4.1), we benchmark our Conditional Gradient solver, and the Mirror Descent algorithm whose hyperparameter ε is validated in the two first values within the set {10i}3 i= 3 leading to stable optimization.