Independence Testing for Temporal Data

Authors: Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T Vogelstein

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerically, we show that the proposed approach yields satisfactory testing power when applied to simulated time series with small sample sizes. Additionally, we present the results of two real-data experiments, utilizing the proposed method to analyze neural connectivity based on f MRI data, as well as uncovering interesting temporal dependencies between the general stock market and low-beta stocks.
Researcher Affiliation Academia Cencheng Shen EMAIL Department of Applied Economics and Statistics University of Delaware Jaewon Chung EMAIL Department of Biomedical Engineering Johns Hopkins University Ronak Mehta EMAIL Department of Statistics University of Washington Ting Xu EMAIL Child Mind Institute Joshua T. Vogelstein EMAIL Department of Biomedical Engineering Johns Hopkins University
Pseudocode Yes 2.2 Main Algorithm Input: Two jointly-sampled datasets represented as X Rp n and Y Rq n, a given choice of sample dependence measure τn( , ) : Rp n Rq n R, and three positive integers: the lag limit L, the number of blocks B, and the number of random permutations R. Step 1: Compute the set of cross dependence sample statistics {τn( X, Y l), l = 0, . . . , L}. Here, ( X, Y l) denotes the sample data with l lags apart, which consists of (n l) pairs of observations: ( X, Y l) = {(X1+l, Y1), (X2+l, Y2), . . . , (Xn, Yn l)}. Step 2: Estimate the optimal dependence lag: ˆL = arg max l [0,L] τn( X, Y l). Here, the weight n l n simply weights each cross dependence statistic based on the number of observations it uses. Step 3: Compute the temporal dependence sample statistic: Tn( X, Y ) = τn( X, Y l). Step 4: Compute the p-value using block permutation: r=1 I(Tn( X, Y ) > Tn( X, YπB))/R, where I( ) is the 0-1 indicator function, and πB is a randomly generated block permutation for each r. Output: The temporal dependence statistic T, the corresponding p-value, and the estimated optimal dependence lag ˆL .
Open Source Code No The paper mentions that "Analysis of Shift HSIC and Wild HSIC was performed using MATLAB code1 and wild Bootstrap2" with footnotes to GitHub links for these methods. However, this refers to third-party tools used for comparison, not the authors' own implementation of the proposed temporal dependence statistic with block permutation. There is no explicit statement or link providing access to the source code for the methodology described in this paper.
Open Datasets Yes This study is based on data from an individual (Subject ID: 100307) of the Human Connectome Project (HCP), which can be downloaded online3. (Footnote 3: https://www.humanconnectome.org/study/hcp-young-adult/data-releases) We collected weekly closing stock prices from January 1, 2014, to May 1, 2014, using Yahoo Finance data, for the S&P 500 ETF (the benchmark) and 10 individual stocks, as shown in Figure 8.
Dataset Splits No The paper describes simulations and real-data analysis for which specific dataset splits (e.g., train/test/validation) are not applicable or explicitly mentioned for reproducibility of a model training process. For the Human Connectome Project data, it uses a pre-existing dataset. For stock data, it collects weekly closing prices. No information on splitting these datasets into training, validation, or test sets is provided.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. It only mentions the time complexity of the method.
Software Dependencies No The paper mentions "Analysis of Shift HSIC and Wild HSIC was performed using MATLAB code1 and wild Bootstrap2" which indicates the use of MATLAB for comparison methods, but it does not specify versions for MATLAB or any other software/libraries used for their own proposed method. No other specific software dependencies with version numbers are provided.
Experiment Setup Yes Each simulation was repeated 300 times, with 1000 permutations and a Type 1 error level of α = 0.05 used to compute the p-values. As for the number of blocks, we used B = 20 in our experiments, which is sufficient for our purposes. For the number of permutation, we used R = 1000 replicates. The sample size is n = 538, and the number of blocks is 20.