reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Independence Testing for Temporal Data

Authors: Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T Vogelstein

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerically, we show that the proposed approach yields satisfactory testing power when applied to simulated time series with small sample sizes. Additionally, we present the results of two real-data experiments, utilizing the proposed method to analyze neural connectivity based on f MRI data, as well as uncovering interesting temporal dependencies between the general stock market and low-beta stocks.
Researcher Affiliation	Academia	Cencheng Shen EMAIL Department of Applied Economics and Statistics University of Delaware Jaewon Chung EMAIL Department of Biomedical Engineering Johns Hopkins University Ronak Mehta EMAIL Department of Statistics University of Washington Ting Xu EMAIL Child Mind Institute Joshua T. Vogelstein EMAIL Department of Biomedical Engineering Johns Hopkins University
Pseudocode	Yes	2.2 Main Algorithm Input: Two jointly-sampled datasets represented as X Rp n and Y Rq n, a given choice of sample dependence measure τn( , ) : Rp n Rq n R, and three positive integers: the lag limit L, the number of blocks B, and the number of random permutations R. Step 1: Compute the set of cross dependence sample statistics {τn( X, Y l), l = 0, . . . , L}. Here, ( X, Y l) denotes the sample data with l lags apart, which consists of (n l) pairs of observations: ( X, Y l) = {(X1+l, Y1), (X2+l, Y2), . . . , (Xn, Yn l)}. Step 2: Estimate the optimal dependence lag: ˆL = arg max l [0,L] τn( X, Y l). Here, the weight n l n simply weights each cross dependence statistic based on the number of observations it uses. Step 3: Compute the temporal dependence sample statistic: Tn( X, Y ) = τn( X, Y l). Step 4: Compute the p-value using block permutation: r=1 I(Tn( X, Y ) > Tn( X, YπB))/R, where I( ) is the 0-1 indicator function, and πB is a randomly generated block permutation for each r. Output: The temporal dependence statistic T, the corresponding p-value, and the estimated optimal dependence lag ˆL .
Open Source Code	No	The paper mentions that "Analysis of Shift HSIC and Wild HSIC was performed using MATLAB code1 and wild Bootstrap2" with footnotes to GitHub links for these methods. However, this refers to third-party tools used for comparison, not the authors' own implementation of the proposed temporal dependence statistic with block permutation. There is no explicit statement or link providing access to the source code for the methodology described in this paper.
Open Datasets	Yes	This study is based on data from an individual (Subject ID: 100307) of the Human Connectome Project (HCP), which can be downloaded online3. (Footnote 3: https://www.humanconnectome.org/study/hcp-young-adult/data-releases) We collected weekly closing stock prices from January 1, 2014, to May 1, 2014, using Yahoo Finance data, for the S&P 500 ETF (the benchmark) and 10 individual stocks, as shown in Figure 8.
Dataset Splits	No	The paper describes simulations and real-data analysis for which specific dataset splits (e.g., train/test/validation) are not applicable or explicitly mentioned for reproducibility of a model training process. For the Human Connectome Project data, it uses a pre-existing dataset. For stock data, it collects weekly closing prices. No information on splitting these datasets into training, validation, or test sets is provided.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. It only mentions the time complexity of the method.
Software Dependencies	No	The paper mentions "Analysis of Shift HSIC and Wild HSIC was performed using MATLAB code1 and wild Bootstrap2" which indicates the use of MATLAB for comparison methods, but it does not specify versions for MATLAB or any other software/libraries used for their own proposed method. No other specific software dependencies with version numbers are provided.
Experiment Setup	Yes	Each simulation was repeated 300 times, with 1000 permutations and a Type 1 error level of α = 0.05 used to compute the p-values. As for the number of blocks, we used B = 20 in our experiments, which is sufficient for our purposes. For the number of permutation, we used R = 1000 replicates. The sample size is n = 538, and the number of blocks is 20.