reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein

Authors: Hugues Van Assel, Cédric Vincent-Cuaz, Nicolas Courty, Rémi Flamary, Pascal Frossard, Titouan Vayer

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate its relevance to the identiﬁcation of low-dimensional prototypes representing data at diﬀerent scales, across multiple image and genomic datasets.
Researcher Affiliation	Academia	Hugues Van Assel EMAIL ENS de Lyon, CNRS, UMPA UMR 5669 Cédric Vincent-Cuaz cedric.vincent-cuaz@epﬂ.ch EPFL, LTS4 Nicolas Courty EMAIL Université Bretagne Sud, IRISA UMR 6074 Rémi Flamary remi.ﬂamary@polytechnique.edu École polytechnique, Institut Polytechnique de Paris, CMAP UMR 7641 Pascal Frossard pascal.frossard@epﬂ.ch EPFL, LTS4 Titouan Vayer EMAIL Inria, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1, LIP UMR 5668
Pseudocode	Yes	Algorithm 1 CG solver for sr GWL 2: F(i) Compute gradient w.r.t T of equation 76. 3: X(i) min X1m=h X 0 X, F(i) 4: T(i+1) (1 γ )T(i) + γ X(i) with γ [0, 1] from exact-line search. 5: until convergence.
Open Source Code	Yes	Code is provided at https://github.com/huguesva/Distributional-Reduction.
Open Datasets	Yes	over 8 labeled datasets detailed in Appendix F including: 3 image datasets (COIL-20 Nene et al. 1996, MNIST & fashion-MNIST Xiao et al. 2017) and 5 genomic ones (PBMC Wolf et al. 2018, SNA 1 & 2 Chen et al. 2019 and ZEISEL 1 & 2Zeisel et al. 2015).
Dataset Splits	No	The paper mentions using well-known datasets like MNIST and Fashion-MNIST, which often have standard splits. However, it does not explicitly state the training, validation, and test splits used for its experiments (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	Yes	All experiments were done on a server using a GPU (Tesla V100-SXM2-32GB) and composed of 18 cores Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz.
Software Dependencies	No	The paper mentions several software components like PyTorch, POT, scikit-learn, Torchmetrics, Geoopt, and Adam/RAdam optimizers, but it does not provide specific version numbers for these software dependencies. For example, it cites PyTorch (Paszke et al., 2017) but doesn't specify which version was used.
Experiment Setup	Yes	For the SEA and UMAP based similarities, we validated perplexity across the set {20, 50, 100, 150, 200, 250}. For all kernels, the number of output samples n spans a set of 10 values, starting at the number of classes in the data and incrementing in steps of 20. For the computation of T in Dist R (see Section 4.1), we benchmark our Conditional Gradient solver, and the Mirror Descent algorithm whose hyperparameter ε is validated in the two ﬁrst values within the set {10i}3 i= 3 leading to stable optimization.