reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Inference for the Support Vector Machine

Authors: Jakub Rybak, Heather Battey, Wen-Xin Zhou

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Simulations The distributional approximations of both the SVM estimator derived in Koo et al. (2008) and the smooth SVM estimator presented here hinge on Bahadur representation. The convergence of the distributional approximation is thus governed by the convergence of the Bahadur remainder. We can thus use the non-asymptotic behaviour of the Bahadur remainder to compare the distributional approximation of SVM and smooth SVM in non-asymptotic settings. Since the non-asymptotic bound for the Bahadur remainder has not been derived for SVMs, we resort to simulations. ... Figure 2: The L2 norm of the Bahadur remainder based on 100 simulations with n/p = 50 for SVM (plot a) and Convolution-smoothed SVM (plot b). ... Figure 3: Median value of Type 1 error for testing the signiﬁcance of a noise variable. Based on 100 simulations for each n, p combination. ... Figure 4: Median and standard deviation of coverage ratios for SVM (plot a) and convolution-smoothed SVM (plot b). Based on 100 simulations with n = 500.
Researcher Affiliation	Academia	Jakub Rybak EMAIL Department of Mathematics Imperial College London London, SW7 2AZ, U.K. Heather Battey EMAIL Department of Mathematics Imperial College London London, SW7 2AZ, U.K. Wen-Xin Zhou EMAIL Department of Information and Decision Sciences University of Illinois Chicago Chicago, IL 60607, USA
Pseudocode	No	The paper describes methods and theoretical derivations using mathematical notation and descriptive text. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about making code publicly available, nor does it provide links to any code repositories.
Open Datasets	No	As in Koo et al. (2008), we consider Gaussian class-conditional densities with a common covariance matrix throughout all simulations, i.e. f(X) = N(µf, Σ0) and g(X) = N(µg, Σ0), with equal class probabilities.
Dataset Splits	No	The paper uses simulated data generated from Gaussian class-conditional densities for its experiments. It does not provide specific training/test/validation dataset splits as it does not use a pre-existing dataset that would require such partitioning.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the simulations or experiments.
Software Dependencies	No	The paper mentions 'Newton-Raphson s method' in the context of solving an equation, but it does not specify any programming languages, software libraries, or their version numbers used for implementation or analysis.
Experiment Setup	Yes	As in Koo et al. (2008), we consider Gaussian class-conditional densities with a common covariance matrix throughout all simulations, i.e. f(X) = N(µf, Σ0) and g(X) = N(µg, Σ0), with equal class probabilities. ... We set Σ0 = cp Ip, where Ip denotes a p-dimensional identity matrix, µ1 = (1, . . . , 1)T , a p-dimensional unit vector and µg = µf. The bandwidth is set to the optimal rate from the Bahadur-remainder perspective h = (p/n)1/4. ... Throughout this section we use Gaussian kernel and set the bandwidth to h = (p/n)1/4.