On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models

Authors: Yining Wang, Adams Wei Yu, Aarti Singh

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Formal approximation guarantees are established for both algorithms, and numerical results on both synthetic and real-world data confirm the effectiveness of the proposed methods.
Researcher Affiliation Academia Yining Wang EMAIL Adams Wei Yu EMAIL Aarti Singh EMAIL Machine Learning Department, School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA
Pseudocode Yes Figure 1: Sampling based experiment selection (expected size constraint)., Figure 2: Sampling based experiment selection (deterministic size constraint)., Figure 3: Greedy experiment selection., Figure 4: The projected gradient descent algorithm.
Open Source Code No The paper does not explicitly provide a link to open-source code or state that the code is publicly available for the described methodology.
Open Datasets No The paper mentions several datasets (material synthesis, CPU performance, Minnesota wind) and cites relevant works (Reeja-Jayan et al., 2012; Nakamura et al., 2017; Ein-Dor and Feldmesser, 1987), but does not provide explicit statements of public availability, direct access links, or formal repositories for these datasets. For the material synthesis and Minnesota wind datasets, the acknowledgments state they were 'sharing with us data' or 'for a pre-processed version,' implying private access rather than public availability.
Dataset Splits No The paper discusses selecting a subset of 'k' experiments or design points for measurement-constrained regression, not conventional training, validation, and test dataset splits for model evaluation. For example, it states 'number of selected design points k ranges from 2p to 10p' and 'We consider selection of subsets of k experiments, with k ranging from 21 to 30', which refers to the size of the selected subset for analysis, not data partitioning for model training and evaluation.
Hardware Specification No The paper describes experimental results and running times but does not specify the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for these experiments.
Software Dependencies No The paper describes algorithms and numerical results but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific solver versions) used for implementation or experiments.
Experiment Setup Yes In all simulations, the experimental pool (number of given design points) n is set to 1000, number of variables p is set to 50, number of selected design points k ranges from 2p to 10p. For randomized methods, we run them for 20 independent trials under each setting and report the median. We apply our proposed methods to an experimental design problem... on a data set consisting of 133 experiments... We consider selection of subsets of k experiments, with k ranging from 21 to 30... A generalized quadratic regression model was employed... CP = β1 + β2T + β3H + β4P + β5R + β6T 2 + β7H2 + β8P 2 + β9R2. To use this data set as a benchmark for evaluating the experiment selection methods, we synthesize labels Y using model Eq. (10) with standard Gaussian noise and measure the difference between fit bβ and the true model β0 = [0.49; 0.30; 0.19; 3.78]. The ratio of the mean-square prediction error 1/n V bθ y 2 2 compared to the MLE of the full-sample OLS estimator 1/n V bθols y 2 2 is reported.