Personalized PCA: Decoupling Shared and Unique Features
Authors: Naichen Shi, Raed Al Kontar
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive numerical experiments highlight Per PCA s superior performance in feature extraction and prediction from heterogeneous datasets. As a systematic approach to decouple shared and unique features from heterogeneous datasets, Per PCA finds applications in several tasks, including video segmentation, topic extraction, and feature clustering. Keywords: Principal component analysis, personalization, heterogeneity. ... Numerical results: Empirical evidence on both synthetic and real-life datasets confirms Per PCA s ability to decouple shared and unique features. Also, Per PCA has exciting applications in video segmentation and topic extraction. For instance, on video segmentation tasks, Per PCA has significant advantages over the popular Robust PCA (Candes et al., 2011) when heterogeneity patterns are not sparse. |
| Researcher Affiliation | Academia | Naichen Shi EMAIL Raed Al Kontar EMAIL Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117, USA |
| Pseudocode | Yes | Algorithm 1 An instance of Per PCA using Polar Projection ... Algorithm 2 Per PCA by St-GD |
| Open Source Code | Yes | An implementation of the proposed method is in the linked Github repository. |
| Open Datasets | Yes | We also apply our algorithm on FEMNIST (Caldas et al., 2019) and CIFAR10 (Krizhevsky et al., 2009). ... We use a surveillance video example from Vacavant et al. (2012). ... we analyze the presidential debate transcriptions from 1960 to 2020 (Asokan, 2022). |
| Dataset Splits | Yes | On average, each client has 89 images. We represent an image by a vector in R784. For these vectors, we randomly choose 80% of them to form the training set and take the rest as the test set. ... To simulate a heterogeneous setting, we separate the training and testing set of CIFAR10 into 20 parts such that each part contains images from only 2 classes. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running the experiments. It mentions 'edge devices' and 'clients' in a general context but no hardware specifications for the computational experiments themselves. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming language versions, library versions, or specific solvers). |
| Experiment Setup | Yes | We set N = 2, d = 3 and n(i) = 1000. Each client has exactly one global u1 and one local component v(i),1. ... We run each experiment with the same stepsize η = 0.1 but from 10 different random initializations... ... We set N = 100 and n = 100. ... We set d = 30 and r1 = 1, r2,(i) = r2 = 1. ... We set r1 = 50 and r2,(1) = = r2,(N) = 50 and apply Algorithm 2 with choice 2. ... We set r1 = 2 and r2,(1) = = r2,(N) = 2. |