reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalized Dimension Reduction Using Semi-Relaxed Gromov-Wasserstein Distance

Authors: Ranthony A. Clark, Tom Needham, Thomas Weighill

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our computational framework to analyze ensembles of political redistricting plans for states with two Congressional districts, achieving an effective visualization of the ensemble as a distribution on a circle which can be used to characterize typical neutral plans, and to flag outliers. Using experiments on the MNIST dataset, we show that SRGW+GD matches or outperforms SMACOF MDS for Euclidean target spaces. We demonstrate its effectiveness for embeddings into spheres on a dataset of rotated MNIST images, and a set of GPS coordinates of cities.
Researcher Affiliation	Academia	1 Department of Mathematics, Duke University, Durham NC 2 Department of Mathematics, Florida State University, Tallahassee FL 3 Department of Mathematics & Statistics, University of North Carolina at Greensboro, Greensboro NC EMAIL, EMAIL, EMAIL
Pseudocode	No	The algorithm is straightforward: in short, we solve a sr GW problem to embed X into a predeﬁned ﬁnite subset of Y , then use this as initialization for gradient descent of the MDS functional (1), with Y as the target metric space, which could be more general than Rm. We now provide some details.
Open Source Code	Yes	Code github.com/thomasweighill/srgw-embedding
Open Datasets	Yes	We use the MNIST dataset consisting of 70,000 28 28 grayscale images of handwritten digits... We use a list of the 20 largest cities2, with the geodesic distance on the Earth between every pair of cities as ground truth... 2https://simplemaps.com/data/world-cities, CC-BY 4.0 license... For each of these states we obtained Census blockgroups from (Manson et al. 2023) and generated an ensemble of 1,000 redistricting plans using the Re Com algorithm (De Ford, Duchin, and Solomon 2021).
Dataset Splits	No	The paper mentions splitting the MNIST dataset into ten smaller datasets (MNIST0 to MNIST9) based on digits, but does not provide explicit train/test/validation splits for the purpose of model training or evaluation. The experiments described involve embedding these entire datasets and calculating distortion.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions 'compute time' in relation to performance.
Software Dependencies	No	The computation of dsr GW,2(X, S) is approximated via the sr GW implementation in the Python Optimal Transport package (Flamary et al. 2021)... In our examples below, we use the Adam optimizer (Kingma and Ba 2014)... with SRGW+GD is between 3 and 14 times faster then SMACOF MDS (using scikit-learn).
Experiment Setup	Yes	Let X = {x1, . . . , xn} and initialize a set of n points (yi)1 i n in Y via yi = f(xi). We then consider the distortion function Y n R deﬁned by (y1, . . . , yn) 7 1 i,j=1 (d X(xi, xj) d Y (yi, yj))2 (8) (cf. the MDS functional (1)) and use a gradient-based method on Y to ﬁnd a local minimum (ˆy1, ˆy2, . . . , ˆyn). Our embedding is then given by ˆf(xi) = ˆyi. We refer to this method as SRGW+GD. In our examples below, we use the Adam optimizer (Kingma and Ba 2014). In practice, the desired embedding space Y may only be known up to certain hyperparamters, such as scale. Our main example below will be when Y is a circle of unknown radius. If Y depends on a scale factor (or multiple scale factors), we add the scale factor as an additional variable in (8). ... GD denotes minimizing (8) with a random initalization and the Adam optimizer. We use 10 random initializations and report the min and max distortion over all trials.