reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Probabilistic Embeddings in Optimal Dimension Reduction

Authors: Ryan Murray, Adam Pickarski

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work considers a generalized version of multidimensional scaling, which seeks to construct a map from high to low dimension which best preserves pairwise inner products or norms. We investigate the variational properties of this problem, leading to the following insights: 1) Particle-wise descent methods implemented in standard libraries can produce non-deterministic embeddings, 2) A probabilistic formulation leads to solutions with interpretable necessary conditions, and 3) The globally optimal solutions to the relaxed, probabilistic problem is only minimized by deterministic embeddings. This progression of results mirrors the classical development of optimal transportation, and in a case relating to the Gromov-Wasserstein distance actually gives explicit insight into the structure of the optimal embeddings, which are parametrically determined and discontinuous on smooth surfaces. Our results also imply that a standard computational implementation for this problem learns sub-optimal mappings, and we discuss how the embeddings learned in that context have highly misleading clustering structure, underscoring the delicate nature of solving this problem computationally.
Researcher Affiliation	Academia	Ryan Murray EMAIL Department of Mathematics North Carolina State University Raleigh, NC 27695 USA Adam Pickarski EMAIL Division of Mathematics North Carolina State University Raleigh, NC 27695 USA
Pseudocode	No	The paper describes mathematical proofs and theoretical concepts. It does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code generating this figure is available on the first author s github.
Open Datasets	No	We consider the problem of embedding a particular point cloud in R2 into R. The point cloud that we choose has 1,000 points placed at (0, .2), as well as 250 points placed randomly upon the unit circle. No concrete access information for a public dataset is provided.
Dataset Splits	No	The paper describes a theoretical analysis and illustrates concepts with a synthetic point cloud example. There is no mention of dataset splits like training, validation, or test sets.
Hardware Specification	No	The paper mentions using "the built-in algorithm for metric multidimensional scaling in Scikit-learn (Pedregosa et al., 2011)" for an example, but does not provide any specific hardware details such as CPU, GPU models, or memory specifications.
Software Dependencies	No	When we utilize the built-in algorithm for metric multidimensional scaling in Scikit-learn (Pedregosa et al., 2011)... The paper mentions "Scikit-learn" but does not specify a version number or other software dependencies with their versions.
Experiment Setup	No	For the numerical example, the paper describes the point cloud generation and the use of Scikit-learn's MDS algorithm with an initial guess, but it does not provide specific experimental setup details such as hyperparameters, optimization settings, or other training configurations.