reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Intrinsic Dimension Estimation Using Wasserstein Distance

Authors: Adam Block, Zeyu Jia, Yury Polyanskiy, Alexander Rakhlin

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a demonstration, we sample images from MNIST in datasets of size ranging in powers of 2 from 32 to 2048, calculate the Wasserstein distance between these two samples, and plot the resulting trend. In the right plot, we pool all of the data to estimate the manifold distances, and then use these estimated distances to compute the Wasserstein distance between the empirical distributions. In order to better compare these two approaches, we also plot the residuals to the linear ﬁt that we expect in the asymptotic regime. Looking at Figure 1, it is clear that we are not yet in the asymptotic regime if we simply use Euclidean distances; on the other hand, the trend using the manifold distances is much more clearly linear, suggesting that the slope of the best linear ﬁt is meaningful.
Researcher Affiliation	Academia	Adam Block EMAIL Department of Mathematics Massachusetts Institute of Technology; Zeyu Jia EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology; Yury Polyanskiy EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology; Alexander Rakhlin EMAIL Department of Brain & Cognitive Sciences Statistics and Data Science Center Massachusetts Institute of Technology
Pseudocode	No	The paper describes the methods and estimators in prose and mathematical formulations, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not contain any explicit statements about open-sourcing code or provide links to code repositories.
Open Datasets	Yes	consider the case of images of the digit 7 (for example) from MNIST (Le Cun and Cortes, 2010).
Dataset Splits	No	The paper mentions sampling images from MNIST in datasets of various sizes for demonstration purposes but does not specify any training/testing/validation splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies	No	The paper mentions the 'Sinkhorn algorithm (Cuturi, 2013)' for computing Wasserstein distances, but it does not specify any software names with version numbers for reproducibility.
Experiment Setup	No	The paper describes a demonstration using MNIST data to evaluate the behavior of the proposed dimension estimators. However, it does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level training settings.