Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Importance Sparsification for Sinkhorn Algorithm

Authors: Mengyu Li, Jun Yu, Tao Li, Cheng Meng

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various synthetic data demonstrate Spar-Sink outperforms mainstream competitors in terms of both estimation error and speed. A real-world echocardiogram data analysis shows Spar-Sink can effectively estimate and visualize cardiac cycles, from which one can identify heart failure and arrhythmia.
Researcher Affiliation Academia Mengyu Li EMAIL Institute of Statistics and Big Data Renmin University of China Beijing, China Jun Yu EMAIL School of Mathematics and Statistics Beijing Institute of Technology Beijing, China Tao Li EMAIL Institute of Statistics and Big Data Renmin University of China Beijing, China Cheng Meng EMAIL Center for Applied Statistics, Institute of Statistics and Big Data Renmin University of China Beijing, China
Pseudocode Yes Algorithm 1 Sinkhorn OT(K, a, b, δ)... Algorithm 2 Sinkhorn UOT(K, a, b, λ, ε, δ)... Algorithm 3 Spar-Sink algorithm for OT... Algorithm 4 Spar-Sink algorithm for UOT... Algorithm 5 IBP({Kk}m k=1, {bk}m k=1, w, δ)... Algorithm 6 Spar-IBP algorithm for Wasserstein barycenters
Open Source Code Yes The implementation code is available at this link: https://github.com/Mengyu8042/Spar-Sink.
Open Datasets Yes We consider an echocardiogram videos data set (Ouyang et al., 2020) containing 10,030 apical-four-chamber echocardiogram videos... Experiments on MNIST. Further, we evaluate our Spar-IBP algorithm on the MNIST data set (Le Cun et et al., 1998) following the work of Cuturi and Doucet (2014).
Dataset Splits No The paper describes using 10,030 echocardiogram videos and mentions 100 randomly selected videos for analysis, and uses MNIST data for generative modeling, but does not specify training/test/validation splits with percentages, counts, or predefined split references for reproducibility.
Hardware Specification Yes All experiments are implemented on a server with 251GB RAM, 64 cores Intel(R) Xeon(R) Gold 5218 CPU and 4 Ge Force RTX 3090 GPU.
Software Dependencies No The paper mentions using the Python Optimal Transport toolbox (Flamary et al., 2021) and the Adam optimizer, but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup Yes We set the stopping threshold δ = 10 6 for all the algorithms considered in the experiments. The maximum number of iterations is set to be 5n for Greenkhorn and to be 103 for all other methods. The decimation factor in Screenkhorn is taken as 3. The regularization parameters are set to be ε = 0.1 and λ = 0.1. For subsampling-based approaches, we set the expected subsample size s = {2, 22, 23, 24} s0(n) with s0(n) = 10 3n log4(n). For SSAE, the regularization parameters are γ = 0.05 and ε = 0.01; the number of epochs is 40; the batch size n = 500; the learning rate is 0.001; and the subsample parameter s = 10s0(n).