reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contrastive Functional Principal Component Analysis

Authors: Eric Zhang, Didong Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a series of applications, CFPCA successfully identifies these foreground-specific structures, thereby revealing distinct patterns and trends that traditional FPCA overlooks. All proofs and additional experimental details are available in the same Git Hub repository as the code. This section presents a series of four simulations designed to demonstrate the effectiveness of CFPCA over FPCA. Table 1 details the configuration of how the foreground and background datasets were generated, following the model assumptions in Equations (5). Table 2 compares performance of FPCA applied solely to the foreground, FPCA applied to the union of the foreground and background, and CFPCA, measured by the bias, computed by finding the L2 norm of the difference of the true and estimated eigenfunction, of the first and the second PCs. Fig. 2 illustrates the first and second FPCs from Simulation 2, as a representative example. This section, we demonstrate the application of CFPCA on two datasets: the gait cycle and stock market.
Researcher Affiliation	Academia	Eric Zhang, Didong Li Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. EMAIL, EMAIL
Pseudocode	Yes	CFPCA algorithm is summarized in Algorithm 1.
Open Source Code	Yes	Code https://github.com/ezhang1218/CFPCA
Open Datasets	Yes	As an illustrative example, Fig. 1 examines a benchmark dataset from the Berkeley growth study, tracking the heights of boys and girls as a function of age from 0 to 18 (Tuddenham and Snyder 1954). The data were collected by (Shorter et al. 2008), processed by (Helwig et al. 2011), and published by (Helwig et al. 2016). In our second case study, we explore the stock market dynamics, focusing on technology and non-technology companies. We sourced daily closing prices from Yahoo Finance using the yfinance package (Aroussi 2024).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It mentions N, M (size of foreground and background) and n (number of time points) for the datasets used in applications, but not how these might be partitioned for evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions that "the implementation in Python uses a different technique (Ramos-Carreo et al. 2019)" and that "We sourced daily closing prices from Yahoo Finance using the yfinance package (Aroussi 2024)". While software names are mentioned with citations, specific version numbers for Python itself or for the cited packages are not explicitly provided in the format requested (e.g., Python 3.8, package X.Y.Z).
Experiment Setup	Yes	When implementing CFPCA, two critical hyperparameters must be considered: α > 0, which controls the influence of the background dataset, and L, the number of dimensions for reduction. We typically set α = 1 as suggested by Equations (5); however, the optimal α can vary depending on the level of shared information between the foreground and background datasets. In the situation without such a subgroup structure, tuning α remains a challenging problem (Li, Jones, and Engelhardt 2020), see also Discussion. The choice of the dimensionality, L, arguably an open question (Wang, Chiou, and M uller 2016), is guided by the specific needs of the analysis, with L = 1 or 2 typically sufficient for visualization purposes.