reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inverse Kernel Decomposition

Authors: Chengrui Li, Anqi Wu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiment section, we compare IKD against four eigen-decomposition-based and four optimizationbased dimensionality reduction methods using synthetic datasets and four real-world datasets, and we can summarize four contributions of IKD: As an eigen-decomposition-based method, IKD achieves more reasonable latent representations than other eigen-decomposition-based methods with better classification accuracy in downstream classification tasks.
Researcher Affiliation	Academia	Chengrui Li EMAIL School of Computational Science & Engineering Georgia Institute of Technology Anqi Wu EMAIL School of Computational Science & Engineering Georgia Institute of Technology
Pseudocode	Yes	Algorithm 1 Inverse kernel decomposition
Open Source Code	Yes	Open-source IKD implementation in Python can be accessed at https://github.com/Jerry Soybean/ikd.
Open Datasets	Yes	We compare IKD against alternatives on four real-world datasets: Single-cell q PCR (PRC) (Guo et al., 2010): Normalized measurements of 48 genes of a single cell at 10 different stages. There are 437 data points in total, resulting in X R437 48. Hand written digits (digits) (Dua & Graff, 2017): It consists 1797 grayscale images of hand written digits. Each one is an 8 8 image, resulting in X R1797 64. COIL-20 (Nene et al., 1996): It consists 1440 grayscale photos. For each one of the 20 objects in total, 72 photos were taken from different angles. Each one is a 128 128 image, resulting in X R1440 16384. Fashion MNIST (F-MNIST) (Xiao et al., 2017): It consists of 70000 grayscale images of 10 fashion items (clothing, bags, etc). We use a subset of it, resulting in X R3000 784.
Dataset Splits	Yes	Specifically, we apply 5-fold cross-validation k-NN (k {5, 10, 20}) on the estimated {2, 3, 5, 10}-dimensional latent to evaluate the performance of each method on each dataset.
Hardware Specification	No	The paper discusses running times but does not specify any particular hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions "IKD implementation in Python", "GPLVM module in the GPy package (GPy, since 2012)", "sklearn", and "official UMAP package (Mc Innes et al., 2018)" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For each trial, we generate the true latent variables from Zm,1:T N 0, 6e \|i j\|, m {1, ..., M}, (12) where M is the latent dimensionality, varying across different datasets. Then, we generate the noiseless data from GP, sinusoidal, and Gaussian bump mapping functions respectively. Afterward, i.i.d. Gaussian noise is added to form the final noisy observations X. ... In each trial, we generate a 3D latent Z R1000 3 (i.e., M = 3) according to Eq. 12, and generate X R1000 N according to Eq. 1 with σ2 = 1 and l = 3. Then Gaussian noise is added: xt,n xt,n + εt,n, (t, n) {1, ..., 1000} {1, ..., N}, where noise εt,n N(0, 0.052).