reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Personalized PCA: Decoupling Shared and Unique Features

Authors: Naichen Shi, Raed Al Kontar

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive numerical experiments highlight Per PCA s superior performance in feature extraction and prediction from heterogeneous datasets. As a systematic approach to decouple shared and unique features from heterogeneous datasets, Per PCA ﬁnds applications in several tasks, including video segmentation, topic extraction, and feature clustering. Keywords: Principal component analysis, personalization, heterogeneity. ... Numerical results: Empirical evidence on both synthetic and real-life datasets conﬁrms Per PCA s ability to decouple shared and unique features. Also, Per PCA has exciting applications in video segmentation and topic extraction. For instance, on video segmentation tasks, Per PCA has signiﬁcant advantages over the popular Robust PCA (Candes et al., 2011) when heterogeneity patterns are not sparse.
Researcher Affiliation	Academia	Naichen Shi EMAIL Raed Al Kontar EMAIL Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117, USA
Pseudocode	Yes	Algorithm 1 An instance of Per PCA using Polar Projection ... Algorithm 2 Per PCA by St-GD
Open Source Code	Yes	An implementation of the proposed method is in the linked Github repository.
Open Datasets	Yes	We also apply our algorithm on FEMNIST (Caldas et al., 2019) and CIFAR10 (Krizhevsky et al., 2009). ... We use a surveillance video example from Vacavant et al. (2012). ... we analyze the presidential debate transcriptions from 1960 to 2020 (Asokan, 2022).
Dataset Splits	Yes	On average, each client has 89 images. We represent an image by a vector in R784. For these vectors, we randomly choose 80% of them to form the training set and take the rest as the test set. ... To simulate a heterogeneous setting, we separate the training and testing set of CIFAR10 into 20 parts such that each part contains images from only 2 classes.
Hardware Specification	No	The paper does not explicitly state the specific hardware used for running the experiments. It mentions 'edge devices' and 'clients' in a general context but no hardware specifications for the computational experiments themselves.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming language versions, library versions, or specific solvers).
Experiment Setup	Yes	We set N = 2, d = 3 and n(i) = 1000. Each client has exactly one global u1 and one local component v(i),1. ... We run each experiment with the same stepsize η = 0.1 but from 10 diﬀerent random initializations... ... We set N = 100 and n = 100. ... We set d = 30 and r1 = 1, r2,(i) = r2 = 1. ... We set r1 = 50 and r2,(1) = = r2,(N) = 50 and apply Algorithm 2 with choice 2. ... We set r1 = 2 and r2,(1) = = r2,(N) = 2.