reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Estimation of Derivatives in Nonparametric Regression

Authors: Wenlin Dai, Tiejun Tong, Marc G. Genton

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct simulation studies to assess the ﬁnite sample performance of the proposed estimators, ˆm(p) q , and make comparisons with the empirical estimator, ˆm(p) emp, in De Brabanter et al. (2013) and the least squares estimator, ˆm(p) lse , in Wang and Lin (2015). ... The mean absolute error (MAE) is used as a measure of estimation accuracy. ... The simulation results for w = 2 are reported as box-plot ﬁgures.
Researcher Affiliation	Academia	Wenlin Dai EMAIL CEMSE Division King Abdullah University of Science and Technology Saudi Arabia Tiejun Tong EMAIL Department of Mathematics Hong Kong Baptist University Hong Kong Marc G. Genton EMAIL CEMSE Division King Abdullah University of Science and Technology Saudi Arabia
Pseudocode	No	The paper describes methods and proofs primarily through mathematical equations and textual descriptions, without presenting any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository or mention code in supplementary materials for the methodology described.
Open Datasets	No	The paper generates synthetic data for simulations, stating 'random errors are generated from a Gaussian distribution, N(0, 0.12)' and defining a regression function 'm(x) = p x(1 x) sin{2.1π/(x + 0.05)}'. It does not use or provide access to any external publicly available datasets.
Dataset Splits	No	The paper describes generating synthetic data for simulations using 'n = 100 and 500 sample sizes' and defines how 'design points' are set (xi = i/n). It also defines 'interior (Int) and boundary (Bd) areas' for evaluation based on k0 = [n/10] of the design points. However, it does not specify explicit training, validation, or test dataset splits in the conventional sense, as all data is generated for simulation.
Hardware Specification	No	The paper describes conducting 'extensive simulation studies' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these experiments.
Software Dependencies	Yes	Here, ˆm(q)(xi) (1 + k0 i n k0) are calculated with the function locpol in the R package locpol (Ojeda Cabrera, 2012) with the parameter deg = q + 2.
Experiment Setup	Yes	We consider the following regression function, m(x) = 5 sin(wπx), with ω = 1, 2, 4 corresponding to diﬀerent levels of oscillations. The n = 100 and 500 sample sizes are investigated. We set the design points as xi = i/n and generate the random errors, εi, independently from N(0, σ2). For each regression function, we consider σ = 0.1, 0.5 and 2... Throughout the simulation, we set k0 = [n/10]... We select the sequence order r from O = {2i : 1 i k0}. We choose the bias-reduction level, q, from Q = {p + 2, p + 4, p + 6}.... For each run of the simulation, we compute the MAE of the estimators at both Int and Bd and repeat the procedure 1000 times for each setting.