Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Double Spike Dirichlet Priors for Structured Weighting
Authors: Huiming Lin, Meng Li
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). |
| Researcher Affiliation | Academia | Huiming Lin EMAIL Meng Li EMAIL Department of Statistics Rice University Houston, TX 77005, USA |
| Pseudocode | Yes | The ADSS algorithm includes the following steps: 1. Set t=0. Initialize (σ 2)(t), γ(t) i , A(t) i for i = 1, . . . , K. Set β(t) = A(t)/ A(t) 1, where A(t) = (A(t) 1 , . . . , A(t) K ), and 1 is the ℓ1 norm of a vector. 2. Set t = t + 1. Given γ(t 1), initialize a candidate vector γ = γ(t 1). Proceed to one of the following with equal probability: (a) (add) randomly select a j from J = {j |γ(t 1) j = 0}. Set γj = 1; (b) (delete) randomly select a j from Jc = {j |γ(t 1) j = 1}. Set γj = 0; (c) (swap) randomly select a j1 from J and j2 from Jc. Set γj1 = 1, γj2 = 0; (d) (stay) no actions; Conditional on γ, for i = 1, , K, propose a candidate Ai Gamma(ρ1 γi + ρ2(1 γi), 1). Set β = A/ A 1, where A = ( A1, , AK). 3. Accept γ(t) = γ and β(t) = β with probability 1, {θ/(1 θ)}| γ| {θ/(1 θ)}|γ(t 1)| exp{ (σ 2)(t 1) Pn i=1(yi x T i β)2/2} exp{ (σ 2)(t 1) Pn i=1 yi x T i β(t 1) 2 /2} Otherwise, keep γ(t) = γ(t 1), β(t) = β(t 1). 4. Draw (σ 2)(t) Gamma a1 + n/2, a2 + Pn i=1 yi x T i β(t) 2 /2 . 5. Repeat Steps 2 4 for niter times. |
| Open Source Code | Yes | We provide R code at https://github.com/xylimeng/Structured Ensemble for routine implementation. |
| Open Datasets | Yes | We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). ... concrete data set (Yeh, 1998) from the UCI repository (Dua and Graff, 2017) |
| Dataset Splits | Yes | The data set contains 1030 samples. We randomly sample 515 samples as the training set and use the rest for testing. |
| Hardware Specification | No | The paper does not explicitly state any specific hardware used for running its experiments, only mentioning that the method is "computationally easy to implement". |
| Software Dependencies | No | We provide R code at https://github.com/xylimeng/Structured Ensemble for routine implementation. ... using the randomForest package (Liaw and Wiener, 2002) in R with the default settings for all arguments. |
| Experiment Setup | Yes | We choose ρ1 = K2 and ρ2 = 1/K, following the theoretical results in Section 4 and particularly the condition α1/2 + α2 1. We put a weak Gamma(a1, a2) prior on σ 2 with a1 = a2 = 0.01. For posterior sampling for the proposed method and other Bayesian methods, we run niter = 20000 iterations and burn in the first 15000. For the choice of θ, we set it to 1/K, which is supported by Theorem 3, but we note that one can also specify other preferred values or a range of values based on their domain knowledge about the sparsity level. Other than the default choice of fixing θ, we will also compare a fully Bayesian treatment by placing a Beta prior on θ Beta(1, 0.5K). |