Contrastive Functional Principal Component Analysis
Authors: Eric Zhang, Didong Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a series of applications, CFPCA successfully identifies these foreground-specific structures, thereby revealing distinct patterns and trends that traditional FPCA overlooks. All proofs and additional experimental details are available in the same Git Hub repository as the code. This section presents a series of four simulations designed to demonstrate the effectiveness of CFPCA over FPCA. Table 1 details the configuration of how the foreground and background datasets were generated, following the model assumptions in Equations (5). Table 2 compares performance of FPCA applied solely to the foreground, FPCA applied to the union of the foreground and background, and CFPCA, measured by the bias, computed by finding the L2 norm of the difference of the true and estimated eigenfunction, of the first and the second PCs. Fig. 2 illustrates the first and second FPCs from Simulation 2, as a representative example. This section, we demonstrate the application of CFPCA on two datasets: the gait cycle and stock market. |
| Researcher Affiliation | Academia | Eric Zhang, Didong Li Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. EMAIL, EMAIL |
| Pseudocode | Yes | CFPCA algorithm is summarized in Algorithm 1. |
| Open Source Code | Yes | Code https://github.com/ezhang1218/CFPCA |
| Open Datasets | Yes | As an illustrative example, Fig. 1 examines a benchmark dataset from the Berkeley growth study, tracking the heights of boys and girls as a function of age from 0 to 18 (Tuddenham and Snyder 1954). The data were collected by (Shorter et al. 2008), processed by (Helwig et al. 2011), and published by (Helwig et al. 2016). In our second case study, we explore the stock market dynamics, focusing on technology and non-technology companies. We sourced daily closing prices from Yahoo Finance using the yfinance package (Aroussi 2024). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions N, M (size of foreground and background) and n (number of time points) for the datasets used in applications, but not how these might be partitioned for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions that "the implementation in Python uses a different technique (Ramos-Carreo et al. 2019)" and that "We sourced daily closing prices from Yahoo Finance using the yfinance package (Aroussi 2024)". While software names are mentioned with citations, specific version numbers for Python itself or for the cited packages are not explicitly provided in the format requested (e.g., Python 3.8, package X.Y.Z). |
| Experiment Setup | Yes | When implementing CFPCA, two critical hyperparameters must be considered: α > 0, which controls the influence of the background dataset, and L, the number of dimensions for reduction. We typically set α = 1 as suggested by Equations (5); however, the optimal α can vary depending on the level of shared information between the foreground and background datasets. In the situation without such a subgroup structure, tuning α remains a challenging problem (Li, Jones, and Engelhardt 2020), see also Discussion. The choice of the dimensionality, L, arguably an open question (Wang, Chiou, and M uller 2016), is guided by the specific needs of the analysis, with L = 1 or 2 typically sufficient for visualization purposes. |