Linear Regression With Unmatched Data: A Deconvolution Perspective
Authors: Mona Azadkia, Fadoua Balabdaoui
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Several applications with synthetic and real data sets are considered to illustrate the theory. |
| Researcher Affiliation | Academia | Mona Azadkia EMAIL Department of Statistics London School of Economics and Political Science London, United Kingdom Fadoua Balabdaoui EMAIL Department of Mathematics ETH Z urich Z urich, Switzerland |
| Pseudocode | No | The paper describes mathematical derivations and methodological steps in prose but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using specific R functions and packages (e.g., 'optim from the package stats of the open software R', 'function density from R package stats with hyper-parameter SJ') but does not state that the authors are releasing their own code for the methodology developed in the paper. |
| Open Datasets | Yes | We apply our method to data from 1850 to 1930 decennial censuses of the United States studied in Olivetti and Paserman (2015); D Haultfoeuille et al. (2022) using the 1 percent IPUMS samples (Ruggles et al., 2010). We consider the Power Plant data set from UCI Machine Learning Repository1. [Footnote 1: https://archive.ics.uci.edu/] |
| Dataset Splits | No | The paper describes how samples were generated or sub-sampled for experiments (e.g., '1000 independent samples... of size n = 4000', 'select a subset of size 4000', 'select a sub-sample of matched data of size m = 30'), but it does not provide specific training/validation/test splits, split percentages, or cross-validation strategies needed for reproducibility. |
| Hardware Specification | No | The paper does not mention any specific hardware (e.g., CPU, GPU models, memory, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'the function optim from the package stats of the open software R' and 'function density from R package stats with hyper-parameter SJ'. While 'R' is a programming environment and 'stats' is a package, specific version numbers for R or the 'stats' package are not provided. |
| Experiment Setup | Yes | We use the default setting of the function optim from the package stats of the open software R. The default method of optimization is the method introduced by Nelder and Mead (1965). We consider two different families of centred distributions, Normal and Laplace. We consider different possible values for their scale parameters so that the standard deviation (sd) of the noise varies in the set {0.1, 0.2, ..., 1}. For the DLSE estimator ˆβn, we need an estimate of the noise distribution, and for this, we use a Kernel density estimator based on the residuals of OLS βm obtained using the matched data. We use a Gaussian kernel and select the bandwidth according to Sheather and Jones (1991). |