reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially private methods for managing model uncertainty in linear regression

Authors: Víctor Peña, Andrés F. Barrientos

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the performance of our methods in Sections 4.4 and 5.2. We include additional results from the simulation studies in the Appendix. [...] We evaluate the performance of the methods described in this section in a simulation study and an application.
Researcher Affiliation	Academia	Vı́ctor Péna EMAIL Department d Estadı́stica i Investigació Operativa Universitat Politècnica de Catalunya Barcelona, Spain; Andrés F. Barrientos EMAIL Department of Statistics Florida State University Tallahassee, FL 32306, USA
Pseudocode	No	The paper references existing algorithms like "Algorithm 1 in Balle and Wang (2018)" and "Algorithm 2 in Sheﬀet (2019)", but it does not include any structured pseudocode or algorithm blocks for its own methodology.
Open Source Code	No	The paper states: "They are also conveniently implemented in the R package library(BAS) (Clyde, 2020)." and "We implement the methods with the R package BAS (Clyde, 2020)." This indicates the authors used an existing R package for their implementation but does not provide explicit access to their own specific source code for the methodology described in this paper.
Open Datasets	Yes	We analyze a random sample of 200 students from the High School and Beyond survey, which was conducted by the National Center of Education Statistics. We obtained the data from Diez et al. (2012). In R, they are available as data(hsb2) in library(openintro). [...] The data set includes n = 49,436 heads of households with non-negative incomes. We consider 6 predictors: age in years (β1), age squared (β2), marital status (β3), sex (β4), education (β5), and race (β6). All predictors are numeric or binary except for education, which is an ordinal variable. To reduce the number of coeﬃcients in the model, we treat education as numeric, ranging from 1 (for less than 1st grade) to 16 (for doctoral degree). The binary predictors are: marital status (1: civilian spouse present; 0: otherwise), sex (1: male; 0: female), and race (1: white; 0: otherwise). The response variable is income. In this application, the non-private inclusion probabilities are all close to one. To provide a more challenging benchmark for our methods, we permute the rows for marital status and education in the design matrix to artiﬁcially make the inclusion probabilities for β3 and β5 close to zero. The predictors and the response are centered and rescaled to the interval (−0.5, 0.5). Figure 5 displays the posterior expected values of β1, β3, and β4 with the Zellner-Siow prior and ε = 0.9. We use the histograms described in Section 5.1 to deﬁne approximate 95% conﬁdence sets for T(G) = E(βj \| G). Our choice of matrix norm is the Frobenius norm. Speciﬁcally, we run our procedure 250 times and, for each run and a ﬁxed collection of bins B1, . . . , BK, we summarize each T( ˆC1 α) with its corresponding histogram Hist(T, ˆC0.95) = {(Bk, dk)}K k=1.
Dataset Splits	No	The paper mentions "splitting the data into M disjoint subgroups" for the differential privacy mechanism and "simulating random data splits" or "simulate 1,000 data sets" for simulation studies. However, it does not specify any standard training/test/validation splits for the real-world datasets used in applications.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments or simulations.
Software Dependencies	Yes	Our work is based on mixtures of g-priors because, when combined with right-Haar priors on the common parameters, they satisfy a list of appealing criteria proposed in Bayarri et al. (2012). They are also conveniently implemented in the R package library(BAS) (Clyde, 2020).
Experiment Setup	Yes	The subsample and aggregate technique requires the speciﬁcation of censoring limits L < U and a number of subgroups M. These choices aﬀect the performance of the methods [...] We consider ε {0.5, 0.9} and, in the case of the Wishart mechanism, we set δ = 1/n. [...] In the simulation study, we found Bayes factors with the Zellner-Siow prior (ZS) and information criteria with BIC. The prior distribution on the model space π(γ) is the hierarchical uniform prior proposed in Scott and Berger (2010). [...] In all cases, we add a regularization parameter r to the diagonal entries of G . For the Laplace mechanism, we set r to be the 99-th percentile of eigmin(E), which we ﬁnd via simulation. For the Wishart mechanism, we use the analytical expression in Remark 2 of Sheﬀet (2019). [...] We simulate data from a normal linear model with p predictors, where p is set to 2, 6, or 9. The sample size n (in thousands) varies from 5 to 10,000. The number of active predictors in the true model \|T\| depends on the value of p and ranges from 0 (null model is true) to p (full model is true). Speciﬁcally, if p = 2, we set \|T\| {0, 1, 2}; if p = 6, we set \|T\| {0, 3, 6}; and if p = 9, we set \|T\| {0, 4, 9}. The predictors are independently drawn from the uniform distribution on (-2, 2). Following Hastie et al. (2017), we deﬁne the signal-to-noise ratio (SNR) as the variance of the regression mean (which is random, since we are simulating predictors and β) divided by σ2. In our simulations, we assume that the intercept is zero and β is a p-dimensional vector equal to b[1, . . . , 1] . We use optimization to ﬁnd σ2 and b such that SNR = 0.5 and the response falls within (-2, 2) with high probability. For each combination of \|T\| and n, we simulate 1,000 data sets. All the data sets we simulated are such that the response falls in (-2, 2). We consider ε {0.5, 0.9} and, in the case of the Wishart mechanism, we set δ = 1/n. We assess the performance of the methods by tracking Monte Carlo averages of predictive mean squared errors and the posterior probability of the true model. [...] The predictors and the response are centered and rescaled to the interval (-0.5, 0.5). We use the histograms described in Section 5.1 to deﬁne approximate 95% conﬁdence sets for T(G) = E(βj \| G).