reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dimensionality Reduction and Wasserstein Stability for Kernel Regression

Authors: Stephan Eckstein, Armin Iske, Mathias Trabs

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 gives numerical examples illustrating the procedure. We mainly aim to shed some light on the comparison of the occurring errors for direct regression versus inclusion of dimensionality reduction, as discussed above. Due to computational constraints, the insight of these examples towards true asymptotic behavior is of course limited, but we believe the illustration for smaller sample sizes and visualization of the absolute errors involved can nevertheless be insightful. [...] Figure 4: Overall estimation error R (f(x) y)2ρ(dx, dy) for diﬀerent estimators f and different generation of Y data.
Researcher Affiliation	Academia	Stephan Eckstein EMAIL Department of Mathematics ETH Z urich Zurich, CH Armin Iske EMAIL Department of Mathematics Universit at Hamburg Hamburg, DE Mathias Trabs EMAIL Department of Mathematics Karlsruhe Institute of Technology Karlsruhe, DE
Pseudocode	No	The paper describes mathematical methods and derivations. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology.
Open Datasets	No	In Section 4, the paper states: "The X-data is generated as follows: Consider X to be uniformly distributed on [ −1, 1]d × [ ε, ε]D−d, where we set ε = 0.1." This indicates that the data used for numerical examples is synthetically generated, not a publicly available dataset.
Dataset Splits	No	The paper uses synthetically generated data for its numerical examples. It states in Section 4: "Each reported value is an average over 50 independent runs of generating the respective sample." This describes a simulation setup rather than explicit training/test/validation splits of a fixed dataset.
Hardware Specification	No	The paper discusses numerical examples in Section 4 but does not specify any hardware (e.g., GPU/CPU models, memory) used for conducting these experiments.
Software Dependencies	No	The paper mentions in Section 4 that "We use cross-validation for the choice of λ" and describes different kernel functions, implying the use of software for implementation. However, it does not specify any software libraries, packages, or their version numbers used in the experiments.
Experiment Setup	Yes	In Section 4, the paper provides detailed experimental setup information: "Let d = 2 and D = 10. The X-data is generated as follows: Consider X to be uniformly distributed on [ −1, 1]d × [ ε, ε]D−d, where we set ε = 0.1. We deﬁne X to be some rotation (with an orthogonal transformation) of X, say X = A X." It further specifies functions for Y-data: "For the ﬁrst case, we deﬁne Y := f(1)(P(X)) + U, and for the second case Y := f(1)(X) + U, where U is uniformly distributed on [ −0.1, 0.1] and independent of all other variables. We (rather arbitrarily) deﬁne by f(1) : RD → R, which we (rather arbitrarily) deﬁne by f(1)(x) := sin ∑D i=1 xi ." and "f(2)(x) := \| sin(2 ∑D i=1 xi)\| ". It also describes kernel functions: "Each kernel is of the form K(x, y) = φ(\|x − y\|), where we use φ∞(r) = exp(−r2) (C∞, Gaussian kernel), φ2(r) = max{0, 1 − r}8 (8r + 1) (C2, cf. Zhu, 2012, Table 4.1), φ0(r) = max{0, 1 − r}6 (8r + 1) (C0, cf. Zhu, 2012, Table 4.1)." and how parameters are chosen: "We use cross-validation for the choice of λ" and "The bandwidth h ∈ [10−3, 10] is chosen with cross-validation, too."