reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

Authors: Shusen Wang, Luo Luo, Zhihua Zhang

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6 we conduct experiments to compare among the column sampling algorithms. In Section 8 we empirically evaluate the proposed spectral shifting model. Experiments demonstrate that MEKA can be signiﬁcantly improved by spectral shifting. We empirically conduct comparison among three column selection algorithms uniform sampling, uniform + adaptive2, and the near-optimal + adaptive sampling algorithm. We perform experiments on several datasets collected on the LIBSVM website.
Researcher Affiliation	Academia	Shusen Wang EMAIL Department of Statistics University of California at Berkeley Berkeley, CA 94720 Luo Luo EMAIL Zhihua Zhang EMAIL Department of Computer Science and Engineering Shanghai Jiao Tong University 800 Dong Chuan Road, Shanghai, China 200240
Pseudocode	Yes	Algorithm 1 Computing the Prototype Model in O(nc + nd) Memory. Algorithm 2 The Adaptive Sampling Algorithm. Algorithm 3 The Uniform+Adaptive2 Algorithm. Algorithm 4 The Incomplete Uniform+Adaptive2 Algorithm. Algorithm 5 The Spectral Shifting Method.
Open Source Code	No	The paper does not provide concrete access to its own source code. It mentions using 'the code released by the authors with default settings' for MEKA (a third-party tool), but no statement or link for the code developed in this paper.
Open Datasets	Yes	We perform experiments on several datasets collected on the LIBSVM website http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
Dataset Splits	Yes	We perform a ﬁve-fold cross-validation without using kernel approximation to predetermine the two parameters σ and γ, and the same parameters are used for all the kernel approximation methods. For each of the compared methods, we randomly hold 80% samples for training and the rest for test; we repeat this procedure 50 times and record the average MSE
Hardware Specification	Yes	We run the algorithms on a workstation with Intel Xeon 2.40GHz CPUs, 24GB memory, and 64bit Windows Server 2008 system.
Software Dependencies	No	The models and algorithms are all implemented in MATLAB. We set MATLAB in single thread mode by the command max Num Comp Threads(1). The paper mentions MATLAB but does not specify a version number or any other software dependencies with their versions.
Experiment Setup	Yes	We set the target rank k to be k = n/100 in all the experiments unless otherwise speciﬁed. We evaluate the performance by Approximation Error = K K F / K F. We set γ in the following way. Letting p = 0.05n , we deﬁne η Pp i=1 λ2 i (K) Pn i=1 λ2 i (K) = Kp 2 F K 2 F , which denotes the ratio of the top 5% eigenvalues of the kernel matrix K to the all eigenvalues. For each dataset, we use two diﬀerent settings of γ such that η = 0.5 or η = 0.9. We use the Gaussian RBF kernel and tune two parameters: the variance σ2 and the kernel scaling parameter γ. We list the obtained parameters in Table 4.