reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Compact Convex Projections

Authors: Steffen Grünewälder

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate via experiments that the developed regressors are en-par with state-of-the-art regression algorithms for large scale problems. We conducted a set of experiments to gauge the performance of the approach. The ﬁrst set of experiments was constructed to demonstrate the behavior of the error bounds and to compare the optimization routine with some standard optimization procedures. The second set of experiments compared the regressor with well established regressors in a small scale setting. The ﬁnal set of experiments focused on a large scale benchmark data set and compared our method to the Fast-KRR, which is the state-of-the-art regressor for large scale problems.
Researcher Affiliation	Academia	Steffen Gr unew alder EMAIL Department of Mathematics and Statistics Lancaster University Lancaster, England
Pseudocode	Yes	Algorithm 1. Input: y 2 H, T 2 N, and S H. Algorithm 1 (ls). Input: y 2 H, T 2 N, and S H. Algorithm 2. Input: y 2 H, T 2 N, and S H. Algorithm 3. Input: y 2 Rn, T 2 N, r > 0, a kernel function k and x1; : : : ; xn 2 X. Algorithm 3 (ls). Input: y 2 Rn, T 2 N, r > 0, a kernel function k and x1; : : : ; xn 2 X. Algorithm 3 (ls, ae). (replace 8. and 9. in Algorithm 3 (ls)). Input: y 2 Rn, > 0, r > 0, a kernel function k and x1; : : : ; xn 2 X.
Open Source Code	No	The paper does not provide an explicit statement or link to the authors' own open-source code for the described methodology. It mentions third-party tools like the 'cvx Matlab toolbox' and 'SDPT3 package' but does not indicate that the authors' implementation is publicly available.
Open Datasets	Yes	We reproduced the experiment from Zhang et al. (2013) which uses the million songs data set (about 450000 data points and a covariate dimension of 90).
Dataset Splits	Yes	We ﬁtted the maximum a posteriori estimator (MAP, known hyper-parameters) to it (red curve) and then did split the data into an 800 and 200 batch to run a cross validation loop over the hyper parameters (we also used a Gaussian covariance but with an unknown width parameter). We downsampled the data set to a small subset of 5000 training points and 1000 test points.
Hardware Specification	No	We made 25 GB of memory available to the Fast-KRR method, which allowed us to go down to 26 partitions on our cluster. The memory requirement increases linearly in the number of elements per partition and quadratically in the sample size.
Software Dependencies	No	The cvx toolbox uses the SDPT3 package to solve semideﬁnite-quadratic-linear programming problems. The details of the SDPT3 package are described in Toh, Todd, and Tutuncu (1999); Tutuncu, Toh, and Todd (2003).
Experiment Setup	Yes	We generated 1000 data points from a Gaussian process (Gaussian covariance function) with normal distributed noise (the right plot in Figure 5). We ﬁtted the maximum a posteriori estimator (MAP, known hyper-parameters) to it (red curve) and then did split the data into an 800 and 200 batch to run a cross validation loop over the hyper parameters (we also used a Gaussian covariance but with an unknown width parameter). We ran the CCP algorithm for 20 (yellow) and 100 (purple) iterations (without line search). ... We used as a stopping rule for both methods an error below 10 4... We normalized the data as in Zhang et al. (2013) by letting each covariate dimension have standard deviation 1. We also used the same kernel (Gaussian with D 6). Finally, we normalized the response variable (year when a song appeared) to lie in Œ0; 1 by subtracting the minimal year and dividing by (maximum minimum). ... We used similar partition sizes as in Zhang et al. (2013) for Fast-KRR and we ran the CCP-regressor with r D 100000.