reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Testing Conditional Independence via Quantile Regression Based Partial Copulas

Authors: Lasse Petersen, Niels Richard Hansen

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5 we examine the proposed test through a simulation study where we assess the level and power properties of the test and benchmark it against existing nonparametric conditional independence tests.
Researcher Affiliation	Academia	Lasse Petersen EMAIL Department of Mathematical Sciences University of Copenhagen Universitetsparken 5, 2100 Copenhagen, Denmark Niels Richard Hansen EMAIL Department of Mathematical Sciences University of Copenhagen Universitetsparken 5, 2100 Copenhagen, Denmark
Pseudocode	No	The paper describes a generic testing procedure (Definition 12) and an out-of-the-box procedure (Section 4.7) in prose, but it does not present any structured pseudocode or algorithm blocks with numbered steps and typical code formatting.
Open Source Code	Yes	The implementation and code for producing the simulations can be obtained from https://github.com/lassepetersen/partial-copula-CI-test.
Open Datasets	No	This section gives an overview of the data generating processes that we use for benchmarking and comparison. The first category consists of data generating processes of the form X = f1(Z) + g1(Z) ε1 and Y = f2(Z) + g2(Z) ε2 (H) where f1, f2, g1, g2 : Rd R belong to some function class and ε1, ε2 are independent errors. ... The paper describes synthetic data generation methods rather than using pre-existing open datasets.
Dataset Splits	No	The paper uses synthetic data generated according to various processes (Section 5.2) and evaluates the test's level and power based on simulations. It does not mention traditional dataset splits like training, validation, or test sets in the context of model training reproducibility, as its focus is on statistical testing rather than machine learning model development on a fixed dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the simulations or experiments.
Software Dependencies	Yes	The test was implemented in the R language (R Core Team, 2021) using the quantreg package (Koenker, 2021) as the backend for performing quantile regression.
Experiment Setup	Yes	To estimate the conditional distribution functions ˆF (m,n) X\|Z and ˆF (m,n) Y \|Z using Definition 2, we suggest choosing τmin = 0.01 and τmax = 0.99 and form the equidistant grid (τk)m k=1 in T = [τmin, τmax] with the number of gridpoints m = n . We then suggest using a model of the form (3) for both the quantile regression model QX\|Z(τk \| ) and QY \|Z(τk \| ) for each k = 1, . . . , m, where the bases h1 and h2 can be chosen to be e.g. an additive B-spline basis for each component of Z. To test the hypothesis of conditional independence we suggest using the ˆΨn from Definition 16 based on the estimated nonparametric residuals ( ˆU1,i, ˆU2,i)n i=1. To this end we choose q ≥ 1 and let τmin = λ0 < ... < λq = τmax be an equidistant grid in T. We then define the trimming function σk to have the form (13) with trimming parameters λk and λk+1 and approximation parameter δ = 0.01 (λk+1 − λk) for each k = 0, . . . , q − 1. ... The dimension is fixed as d = 1, Z ∼ U([0, 1]) is uniformly distributed on [0, 1], ε1, ε2 and W are independent and N(0, 1)-distributed... We examine level and power by simulating 500 data sets for sample sizes n ∈ {100, 400, 1600} and all combinations of parameters β ∈ {0, 1, 5, 10, 15, 20}, and local alternatives γ2 = γ2 0 / n for γ2 0 ∈ {0, 50, 100, 150}. ...the quantile regression model is fitted using a polynomial basis of degree 2.