Testing Conditional Independence via Quantile Regression Based Partial Copulas
Authors: Lasse Petersen, Niels Richard Hansen
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5 we examine the proposed test through a simulation study where we assess the level and power properties of the test and benchmark it against existing nonparametric conditional independence tests. |
| Researcher Affiliation | Academia | Lasse Petersen EMAIL Department of Mathematical Sciences University of Copenhagen Universitetsparken 5, 2100 Copenhagen, Denmark Niels Richard Hansen EMAIL Department of Mathematical Sciences University of Copenhagen Universitetsparken 5, 2100 Copenhagen, Denmark |
| Pseudocode | No | The paper describes a generic testing procedure (Definition 12) and an out-of-the-box procedure (Section 4.7) in prose, but it does not present any structured pseudocode or algorithm blocks with numbered steps and typical code formatting. |
| Open Source Code | Yes | The implementation and code for producing the simulations can be obtained from https://github.com/lassepetersen/partial-copula-CI-test. |
| Open Datasets | No | This section gives an overview of the data generating processes that we use for benchmarking and comparison. The first category consists of data generating processes of the form X = f1(Z) + g1(Z) ε1 and Y = f2(Z) + g2(Z) ε2 (H) where f1, f2, g1, g2 : Rd R belong to some function class and ε1, ε2 are independent errors. ... The paper describes synthetic data generation methods rather than using pre-existing open datasets. |
| Dataset Splits | No | The paper uses synthetic data generated according to various processes (Section 5.2) and evaluates the test's level and power based on simulations. It does not mention traditional dataset splits like training, validation, or test sets in the context of model training reproducibility, as its focus is on statistical testing rather than machine learning model development on a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the simulations or experiments. |
| Software Dependencies | Yes | The test was implemented in the R language (R Core Team, 2021) using the quantreg package (Koenker, 2021) as the backend for performing quantile regression. |
| Experiment Setup | Yes | To estimate the conditional distribution functions ˆF (m,n) X|Z and ˆF (m,n) Y |Z using Definition 2, we suggest choosing τmin = 0.01 and τmax = 0.99 and form the equidistant grid (τk)m k=1 in T = [τmin, τmax] with the number of gridpoints m = n . We then suggest using a model of the form (3) for both the quantile regression model QX|Z(τk | ) and QY |Z(τk | ) for each k = 1, . . . , m, where the bases h1 and h2 can be chosen to be e.g. an additive B-spline basis for each component of Z. To test the hypothesis of conditional independence we suggest using the ˆΨn from Definition 16 based on the estimated nonparametric residuals ( ˆU1,i, ˆU2,i)n i=1. To this end we choose q ≥ 1 and let τmin = λ0 < ... < λq = τmax be an equidistant grid in T. We then define the trimming function σk to have the form (13) with trimming parameters λk and λk+1 and approximation parameter δ = 0.01 (λk+1 − λk) for each k = 0, . . . , q − 1. ... The dimension is fixed as d = 1, Z ∼ U([0, 1]) is uniformly distributed on [0, 1], ε1, ε2 and W are independent and N(0, 1)-distributed... We examine level and power by simulating 500 data sets for sample sizes n ∈ {100, 400, 1600} and all combinations of parameters β ∈ {0, 1, 5, 10, 15, 20}, and local alternatives γ2 = γ2 0 / n for γ2 0 ∈ {0, 50, 100, 150}. ...the quantile regression model is fitted using a polynomial basis of degree 2. |