reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Online Optimization using Kalman Recursion

Authors: Joseph de Vilmarest, Olivier Wintenberger

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate numerically the competitiveness of the static EKF for logistic regression in Section 6.6. Experiments We experiment the static EKF for logistic regression. Precisely, we compare the following sequential algorithms that we all initialize at 0: ... We evaluate the diﬀerent algorithms with the mean squared error E[ ˆθt θ 2] that we approximate by its empirical version on 100 samples. We display the results in Figure 2.6.2 Real Data Sets To illustrate better the robustness to misspeciﬁcation, we run the same procedures on real data sets: Forest cover-type (Blackard and Dean, 1999): ... Adult income (Kohavi, 1996): ... We evaluate through an empirical version of E[L(ˆθn)] L(θ ) estimated on 100 samples and where L is estimated on the test set, see Figure 3.
Researcher Affiliation	Collaboration	Joseph de Vilmarest joseph.de EMAIL Laboratoire de Probabilit es, Statistique et Mod elisation, Sorbonne Universit e, CNRS 4 place Jussieu, 75005 Paris, France and Electricit e de France R&D, Olivier Wintenberger EMAIL Laboratoire de Probabilit es, Statistique et Mod elisation, Sorbonne Universit e, CNRS 4 place Jussieu, 75005 Paris, France
Pseudocode	Yes	Algorithm 1: Static Extended Kalman Filter for Generalized Linear Model, Algorithm 2: Recursive updates of the ONS and the static EKF, Algorithm 3: Truncated Extended Kalman Filter for Logistic Regression
Open Source Code	No	The paper does not provide any explicit statement about releasing source code, nor any link to a code repository.
Open Datasets	Yes	Forest cover-type (Blackard and Dean, 1999): the feature vector is of dimension d = 54, and as it is a multi-class task (7 classes) we focus on classifying 2 versus all others. There are n = 581012 instances and we randomly split in two halves for training and testing. Adult income (Kohavi, 1996): the objective is to predict whether a person s annual income is smaller or bigger than 50K. There are 14 explanatory variables, and we obtain d = 98 once categorical variables are transformed into binary variables. We use the canonical split between training (32561 instances) and testing (16281 instances).
Dataset Splits	Yes	Forest cover-type (Blackard and Dean, 1999): ... There are n = 581012 instances and we randomly split in two halves for training and testing. Adult income (Kohavi, 1996): ... We use the canonical split between training (32561 instances) and testing (16281 instances).
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers.
Experiment Setup	Yes	We experiment the static EKF for logistic regression. Precisely, we compare the following sequential algorithms that we all initialize at 0: ... We take the default value P1 = Id along with the value β = 0.49 suggested by Bercu et al. (2020). ... The convex region of search is a ball centered in 0 and of radius Dθ = 1.1 θ , a setting where we have good knowledge of θ . We consider two choices of the exp-concavity constant on which the ONS crucially relies to deﬁne the gradient step size. First, we use the only available bound e DθDX. Second, in the settings where the step size is so small that the ONS doesn t move, we use the exp-concavity constant κ0 at θ . ... First we test the choice of the gradient step size γ = 1/(2D2 X N) denoted by ASGD and a second version with γ = θ /(DX N) denoted by ASGD oracle.