reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Smoothed Nonparametric Derivative Estimation using Weighted Difference Quotients

Authors: Yu Liu, Kris De Brabanter

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In section 4, we conduct Monte Carlo experiments to compare the proposed methodology with smoothing splines and local polynomial regression.
Researcher Affiliation	Academia	Yu Liu EMAIL Department of Computer Science Iowa State University Ames, IA 50011, USA Kris De Brabanter EMAIL Department of Statistics Department of Industrial Manufacturing & Systems Engineering Iowa State University Ames, IA 50011, USA
Pseudocode	No	The paper describes procedures and mathematical derivations but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	No	The paper mentions using third-party R packages like 'ks', 'locpol', and 'pspline' but does not provide concrete access to source code developed specifically for the methodology described in this paper by the authors. There is no explicit statement of code release or a link to a repository for their own implementation.
Open Datasets	No	The paper primarily uses simulated data generated from mathematical functions and distributions (e.g., 'm(X) = cos^2(2πX) + log(4/3 + X) for X ~ U(0, 1)', 'm(X) = 50e^(-8(1-2X)^4(1-2X)) for X ~ beta(2, 2)'). It does not specify or provide access information for any publicly available or open external datasets.
Dataset Splits	No	The paper uses simulated data generated for Monte Carlo experiments, describing sample sizes (e.g., 'n = 1000' or 'n = 700') and the number of repetitions ('100 times'). However, it does not describe specific training, validation, or test dataset splits in the conventional sense, as data is generated anew for each simulation run rather than being split from a fixed dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used to conduct the simulation experiments.
Software Dependencies	Yes	In all simulations, we estimate the density f and distribution F using kernel methods (R package ks (Duong, 2018)). The tuning parameter k is selected based on Corollary 5 over a positive integer set {1, 2, . . . , 499}. We use local cubic regression (p = 3) with bimodal kernel to initially smooth the data. Bandwidths h were selected from the set {0.04, 0.045, . . . , 0.1} for both (23) and (24) and corrected for a unimodal Gaussian kernel. Next, we compare the proposed methodology with several popular methods for nonparametric derivative estimation, i.e. the local slope of the local polynomial regression with p = 2, p = 3 (R package locpol (Ojeda, 2012)) and penalized smoothing splines (R package pspline (Ramsey and Ripley, 2017)).
Experiment Setup	Yes	The tuning parameter k is selected based on Corollary 5 over a positive integer set {1, 2, . . . , 499}. We use local cubic regression (p = 3) with bimodal kernel to initially smooth the data. Bandwidths h were selected from the set {0.04, 0.045, . . . , 0.1} for both (23) and (24) and corrected for a unimodal Gaussian kernel. The sample size for both models is n = 1000 with e ~ N(0, 0.1^2) and e ~ N(0, 2^2) for (23) and (24) respectively. For the Monte Carlo study, we constructed data sets of size n = 700 and generated the function... 100 times according to model (2) with e ~ N(0, 0.2^2). Bandwidths were selected from the set {0.03, 0.035, . . . , 0.07} and corrected for a unimodal Gaussian kernel. In this simulation, we choose the gird search space of (k1, k2) to be {1, 2, . . . , 100} x {1, 2, . . . , 100} for all models. Bandwidths h are selected from the set {0.05, 0.055, . . . , 0.1} for both functions (23) and (27).