Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Hierarchical Refinement: Optimal Transport to Infinity and Beyond
Authors: Peter Halmos, Julian Gold, Xinhao Liu, Benjamin Raphael
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the advantages of Hierarchical Refinement on several datasets, including ones containing over a million points, scaling full-rank OT to problems previously beyond Sinkhorn s reach. We benchmark Hierarchical Refinement (Hi Ref) against the full-rank OT methods Sinkhorn (Cuturi, 2013), Prog OT (Kassraie et al., 2024), and mini-batch OT (Genevay et al., 2018; Fatras et al., 2020; 2021b). We evaluate the methods with respect to the Wasserstein-1 and Wasserstein-2 distance on an alignment of 1024 pairs of samples on the Checkerboard (Makkuva et al., 2020), MAFMoons and Rings (Buzun et al., 2024), and Half-Moon and S-Curve (Buzun et al., 2024) synthetic datasets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Princeton University 2Center for Statistics and Machine Learning, Princeton University. Correspondence to: Benjamin J. Raphael <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Hierarchical Refinement |
| Open Source Code | Yes | Our implementation of Hierarchical Refinement is available at https://github.com/raphael-group/Hi Ref. |
| Open Datasets | Yes | We evaluate the methods with respect to the Wasserstein-1 and Wasserstein-2 distance on an alignment of 1024 pairs of samples on the Checkerboard (Makkuva et al., 2020), MAFMoons and Rings (Buzun et al., 2024), and Half-Moon and S-Curve (Buzun et al., 2024) synthetic datasets (Fig. 3, Table S6). We analyze the mouse organogenesis spatiotemporal transcriptomic atlas (MOSTA) datasets, which include spatial transcriptomics data from mouse embryos at successive 1-day time-intervals... We ran Hi Ref on two slices of MERFISH Mouse Brain Receptor Map data from Vizgen... The data are available on the Vizgen website for MERFISH Mouse Brain Receptor Map data release (https://info.vizgen.com/mouse-brain-map). aligning 1.281 million images from the Image Net ILSVRC dataset (Deng et al., 2009; Russakovsky et al., 2015). |
| Dataset Splits | Yes | We subsampled the source dataset to have 84, 172 spots also (uniformly at random), removing a total of 1786 spots. For each pair of datasets of size n and m, we sub-sample the datasets so that the size of the two datasets is given as n min{n, m}. We then took a 50:50 split of the dataset as the two image datasets X, Y to be aligned, where we used a random permutation of the indices of the dataset using torch.randperm so that the splits approximately represent the same distribution over images. |
| Hardware Specification | Yes | The total runtime was 1 minute 26 seconds on an A100 GPU. The total runtime was 36 minutes 8 seconds on an A100 GPU. Figure S2. Runtime scaling across sample-complexities n of a. Hierarchical Refinement (Hi Ref) and b. Sinkhorn for Euclidean cost, 2 (single CPU core). Hierarchical Refinement exhibits linear scaling for increasing n, whereas Sinkhorn exhibits quadratic scaling and is unable to run beyond 16k points. c. Runtime scaling of Hi Ref (GPU). |
| Software Dependencies | No | We use the default implementations of Sinkhorn, Prog OT, and LOT in the high-performance ott-jax library (Cuturi et al., 2022). The preprocessing of this dataset is conducted using the standard SCANPY (Wolf et al., 2018) workflow. We used the Res Net50 architecture (He et al., 2016) available at https://download.pytorch.org/models/resnet50-0676ba61.pth to generate embeddings of each image of dimension d = 2048. Explanation: The paper mentions several software libraries (ott-jax, SCANPY, ResNet50) and a programming language (Python implicitly through SCANPY and PyTorch), but does not provide specific version numbers for these components, which is required for a reproducible description of software dependencies. |
| Experiment Setup | Yes | In particular, Sinkhorn is run with the default entropy regularization parameter of ϵ = 0.05. For the low-rank methods LOT and FRLC (Halmos et al., 2024) we fix a constant rank of r = 40 for these experiments. We ran Hi Ref using the settings max rank = 11 and hierarchy depth=4, for a total runtime of 10 minutes 6 seconds, on an A100 GPU. The random seed was set to 44. For the FRLC algorithm, we set α = 0, γ = 200, Ļin = 500, rank r = 500, using 20 outer iterations and 300 inner iterations. For the LOT algorithm, we were unable to pass a low-rank factorization of the distance matrix, so we had to use a smaller rank r = 20 in order to avoid exceeding GPU memory (the choice r = 20 led to memory usage of 30GB). We set ϵ = 0.01 and otherwise used the default settings of the method. |