reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Double Spike Dirichlet Priors for Structured Weighting

Authors: Huiming Lin, Meng Li

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI).
Researcher Affiliation	Academia	Huiming Lin EMAIL Meng Li EMAIL Department of Statistics Rice University Houston, TX 77005, USA
Pseudocode	Yes	The ADSS algorithm includes the following steps: 1. Set t=0. Initialize (σ 2)(t), γ(t) i , A(t) i for i = 1, . . . , K. Set β(t) = A(t)/ A(t) 1, where A(t) = (A(t) 1 , . . . , A(t) K ), and 1 is the ℓ1 norm of a vector. 2. Set t = t + 1. Given γ(t 1), initialize a candidate vector γ = γ(t 1). Proceed to one of the following with equal probability: (a) (add) randomly select a j from J = {j \|γ(t 1) j = 0}. Set γj = 1; (b) (delete) randomly select a j from Jc = {j \|γ(t 1) j = 1}. Set γj = 0; (c) (swap) randomly select a j1 from J and j2 from Jc. Set γj1 = 1, γj2 = 0; (d) (stay) no actions; Conditional on γ, for i = 1, , K, propose a candidate Ai Gamma(ρ1 γi + ρ2(1 γi), 1). Set β = A/ A 1, where A = ( A1, , AK). 3. Accept γ(t) = γ and β(t) = β with probability 1, {θ/(1 θ)}\| γ\| {θ/(1 θ)}\|γ(t 1)\| exp{ (σ 2)(t 1) Pn i=1(yi x T i β)2/2} exp{ (σ 2)(t 1) Pn i=1 yi x T i β(t 1) 2 /2} Otherwise, keep γ(t) = γ(t 1), β(t) = β(t 1). 4. Draw (σ 2)(t) Gamma a1 + n/2, a2 + Pn i=1 yi x T i β(t) 2 /2 . 5. Repeat Steps 2 4 for niter times.
Open Source Code	Yes	We provide R code at https://github.com/xylimeng/Structured Ensemble for routine implementation.
Open Datasets	Yes	We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). ... concrete data set (Yeh, 1998) from the UCI repository (Dua and Graff, 2017)
Dataset Splits	Yes	The data set contains 1030 samples. We randomly sample 515 samples as the training set and use the rest for testing.
Hardware Specification	No	The paper does not explicitly state any specific hardware used for running its experiments, only mentioning that the method is "computationally easy to implement".
Software Dependencies	No	We provide R code at https://github.com/xylimeng/Structured Ensemble for routine implementation. ... using the randomForest package (Liaw and Wiener, 2002) in R with the default settings for all arguments.
Experiment Setup	Yes	We choose ρ1 = K2 and ρ2 = 1/K, following the theoretical results in Section 4 and particularly the condition α1/2 + α2 1. We put a weak Gamma(a1, a2) prior on σ 2 with a1 = a2 = 0.01. For posterior sampling for the proposed method and other Bayesian methods, we run niter = 20000 iterations and burn in the ﬁrst 15000. For the choice of θ, we set it to 1/K, which is supported by Theorem 3, but we note that one can also specify other preferred values or a range of values based on their domain knowledge about the sparsity level. Other than the default choice of fixing θ, we will also compare a fully Bayesian treatment by placing a Beta prior on θ Beta(1, 0.5K).