reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Approximating Metric Magnitude of Point Sets

Authors: Rayna Andreeva, James Ward, Primoz Skraba, Jie Gao, Rik Sarkar

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in Section 5 show that the approximation methods are fast and accurate. Iterative Normalization outperforms inversion for larger dataset sizes and converges fast; for the subset selection algorithms, Discrete centers approximates the Greedy Maximization approach empirically at a fraction of the computational cost.
Researcher Affiliation	Academia	Rayna Andreeva1, James Ward1, Primoz Skraba2, Jie Gao3, Rik Sarkar1 1School of Informatics, University of Edinburgh 2Queen Mary University of London 3Rutgers University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Iterative normalization algorithm for the approximation of magnitude; Algorithm 2: Greedy algorithm for the computation of original magnitude; Algorithm 3: Discrete Center Hierarchy construction; Algorithm 4: Magnitude Clusterer
Open Source Code	Yes	Code https://github.com/rorondre/approx_magnitude
Open Datasets	Yes	Figure 3 shows the performance of the subset selection algorithms for a number of scikit-learn datasets (Iris, Breast Cancer, Wine) and for subsamples of MNIST, CIFAR10 (Krizhevsky, Nair, and Hinton 2014) and CIFAR100 (Krizhevsky 2009).
Dataset Splits	No	The paper mentions 'a randomly generated dataset with 10^4 points sampled from N(0, 1) in R^2' and 'subsamples of size 500 for popular image datasets'. It uses 'MNIST dataset for 2000 epochs' and 'CIFAR10', but does not explicitly provide specific percentages, sample counts, or citations for train/test/validation splits within the main text for these experiments.
Hardware Specification	Yes	Experiments ran on a NVIDIA 2080Ti GPU with 11GB RAM and Intel Xeon Silver 4114 CPU.
Software Dependencies	No	The paper states 'We use PyTorch's GPU implementation for matrix inversion.' However, it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The Gradient descent experiments used a learning rate of 0.005 and momentum of 0.9. We train five neural networks each with two fully connected hidden layers on the MNIST dataset for 2000 epochs, using cross entropy loss on MNIST. For generalization experiments, ADAM optimizer was used, with a grid of 6 different learning rates in the range [10^-5, 10^-3] and 6 batch sizes between [8, 256].