Robust Sparsification via Sensitivity

Authors: Chansophea Wathanak In, Yi Li, David Woodruff, Xuan Wu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we conduct experiments on real-word datasets to demonstrate that our coreset constructions are effective in approximating the loss function and considerably reduce the running time for robust regression problems while maintaining a good approximation of the objective function. We conduct the experiment on two real-world datasets from the UCI Machine Learning Repository: Appliances Energy Prediction (referred to as Energy) and Gas Turbine Emission (Emission).
Researcher Affiliation Academia 1School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 2College of Computing and Data Sciences, Nanyang Technological University, Singapore 3Department of Computer Science, Carnegie Mellon University, USA. Correspondence to: Yi Li <EMAIL>.
Pseudocode Yes Algorithm 1 Uniform(A, ε, m) ... Algorithm 2 Refine(D, ε, m) ... Algorithm 3 Coreset(A, ε, m) ... Algorithm 4 Robust Regression(A, b, ε, m) ... Algorithm 5 Robust PCA(A, ε, m)
Open Source Code No The text does not contain any explicit statement about releasing the source code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using a third-party heuristic approach, 'Fast LTS', but not the authors' own implementation.
Open Datasets Yes We conduct the experiment on two real-world datasets from the UCI Machine Learning Repository: Appliances Energy Prediction1 (referred to as Energy) and Gas Turbine Emission2 (Emission). 1https://archive.ics.uci.edu/dataset/374/ appliances+energy+prediction 2https://archive.ics.uci.edu/dataset/551/ gas_turbine_co_and_nox_emission_data_set
Dataset Splits No The paper uses real-world datasets and mentions running 1000 trials and drawing samples for coreset verification. However, it does not provide specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits) for the main experimental evaluation.
Hardware Specification Yes All experiments were run on a machine with an Intel i51165G7 @ 2.80GHz CPU and 16 GB memory using Python version 3.12.8.
Software Dependencies Yes All experiments were run on a machine with an Intel i51165G7 @ 2.80GHz CPU and 16 GB memory using Python version 3.12.8.
Experiment Setup Yes We verify that Algorithm 3 produces an effective coreset for subspace embedding with p = 2. Fixing parameters ε and m, we indepedently run Algorithm 3 1000 times. ... We perform 1000 trials with different coresets, solving the regression problem using Fast LTS for each coreset, yielding solutions xj for j = 1, . . . , 1000. ... Table 1: Runtimes (in seconds) for robust regression on our coresets and the whole dataset, with m = 10, for the Energy dataset (top) and the Emission dataset (bottom).