reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simultaneous Dimensionality Reduction: A Data Efficient Approach for Multimodal Representations Learning

Authors: Eslam Abdelaleem, Ahmed Roman, K. Michael Martini, Ilya Nemenman

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using numerical experiments, we demonstrate that linear SDR methods consistently outperform linear IDR methods and yield higherquality, more succinct reduced-dimensional representations with smaller datasets.
Researcher Affiliation	Academia	Eslam Abdelaleem EMAIL Department of Physics Emory University
Pseudocode	No	The paper describes algorithms like PCA, PLS, CCA, and rCCA using mathematical formulations and optimization problems (e.g., Eq. 6, 7, 8, 15) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps.
Open Source Code	No	The text states: "We used Python and the scikit-learn (Pedregosa et al., 2011) library for performing PCA, PLS, and CCA, while the cca-zoo (Chapman & Wang, 2021) library was used for r CCA." This indicates the use of existing libraries, not the release of the authors' own implementation code.
Open Datasets	Yes	To analyze linear DR methods on nonlinear data, we followed the same procedure as in Fig. 6 for a dataset inspired by the noisy MNIST dataset (Le Cun et al., 1998; Wang et al., 2015; 2016; Abdelaleem et al., 2023).
Dataset Splits	Yes	For every numerical experiment, we generate training and test data sets (Xtrain, Ytrain) and (Xtest, Ytest) according to Eqs. (1-2)3. ... This resulted in a total dataset size of 56k images for training and 7k images for testing.
Hardware Specification	No	The simulations were parallelized and run on Amazon Web Services (AWS) servers of various instance types.
Software Dependencies	No	We used Python and the scikit-learn (Pedregosa et al., 2011) library for performing PCA, PLS, and CCA, while the cca-zoo (Chapman & Wang, 2021) library was used for r CCA. For PCA, SVD was performed with default parameters. For PLS, the PLS Canonical method was used with the NIPALS algorithm. For both PLS and CCA, the tolerance was set to 10 4 with a maximum convergence limit of 5000 iterations. For r CCA, regularization parameters were set as c1 = c2 = 0.1. All other parameters not explicitly here were set to their default values.
Experiment Setup	Yes	For PLS, the PLS Canonical method was used with the NIPALS algorithm. For both PLS and CCA, the tolerance was set to 10 4 with a maximum convergence limit of 5000 iterations. For r CCA, regularization parameters were set as c1 = c2 = 0.1.