Intrinsic Dimension Estimation Using Wasserstein Distance

Authors: Adam Block, Zeyu Jia, Yury Polyanskiy, Alexander Rakhlin

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a demonstration, we sample images from MNIST in datasets of size ranging in powers of 2 from 32 to 2048, calculate the Wasserstein distance between these two samples, and plot the resulting trend. In the right plot, we pool all of the data to estimate the manifold distances, and then use these estimated distances to compute the Wasserstein distance between the empirical distributions. In order to better compare these two approaches, we also plot the residuals to the linear fit that we expect in the asymptotic regime. Looking at Figure 1, it is clear that we are not yet in the asymptotic regime if we simply use Euclidean distances; on the other hand, the trend using the manifold distances is much more clearly linear, suggesting that the slope of the best linear fit is meaningful.
Researcher Affiliation Academia Adam Block EMAIL Department of Mathematics Massachusetts Institute of Technology; Zeyu Jia EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology; Yury Polyanskiy EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology; Alexander Rakhlin EMAIL Department of Brain & Cognitive Sciences Statistics and Data Science Center Massachusetts Institute of Technology
Pseudocode No The paper describes the methods and estimators in prose and mathematical formulations, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statements about open-sourcing code or provide links to code repositories.
Open Datasets Yes consider the case of images of the digit 7 (for example) from MNIST (Le Cun and Cortes, 2010).
Dataset Splits No The paper mentions sampling images from MNIST in datasets of various sizes for demonstration purposes but does not specify any training/testing/validation splits.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies No The paper mentions the 'Sinkhorn algorithm (Cuturi, 2013)' for computing Wasserstein distances, but it does not specify any software names with version numbers for reproducibility.
Experiment Setup No The paper describes a demonstration using MNIST data to evaluate the behavior of the proposed dimension estimators. However, it does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level training settings.