reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository. MATLAB and python codes implementing the approximate formula are distributed in (Obuchi, 2017; Takahashi and Obuchi, 2017). Keywords: classiﬁcation, multinomial logistic regression, cross-validation, linear perturbation, self-averaging approximation
Researcher Affiliation	Academia	Tomoyuki Obuchi EMAIL Yoshiyuki Kabashima EMAIL Department of Mathematical and Computing Science Tokyo Institute of Technology 2-12-1, Ookayama, Meguro-ku, Tokyo, Japan
Pseudocode	Yes	Algorithm 1 Approximate CV of the MLR 1: procedure ACV( ˆ W (λ1, λ2), DM, λ2) ... Algorithm 2 Self-averaging approximate CV of the MLR 1: procedure SAACV( ˆ W (λ1, λ2), DM, λ2)
Open Source Code	Yes	MATLAB and python codes implementing the approximate formula are distributed in (Obuchi, 2017; Takahashi and Obuchi, 2017).
Open Datasets	Yes	The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository. MATLAB and python codes implementing the approximate formula are distributed in (Obuchi, 2017; Takahashi and Obuchi, 2017). Keywords: classiﬁcation, multinomial logistic regression, cross-validation, linear perturbation, self-averaging approximation
Dataset Splits	Yes	In principle, we should compare our approximate result with that of the LOO CV (k = M) because our formula approximates it. However for large M, the literal LOO CV requires huge computational burdens despite that the result is empirically not much diﬀerent from that of the k-hold CV with moderate ks. Hence in some of the following experiments with large M, we use the 10-hold CV instead of the LOO CV.
Hardware Specification	Yes	In all of the experiments, we used a single CPU of Intel(R) Xeon(R) E5-2630 v3 2.4GHz.
Software Dependencies	Yes	To solve the optimization problems in eqs. (4,6), we employed Glmnet (Friedman et al., 2010) which is implemented as a MEX subroutine in MATLAB R .
Experiment Setup	Yes	Unless explicitly mentioned, we set this as δ = 10 8 being tighter than the default value. This is necessary since we treat problems of rather large sizes. A looser choice for δ rather strongly aﬀects the literal CV result, while it does not change the full solution or the training error as much. As a result, our approximations employing only the full solution are rather robust against the choice of δ compared to the literal CV.