reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Sparsification via Sensitivity

Authors: Chansophea Wathanak In, Yi Li, David Woodruff, Xuan Wu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6, we conduct experiments on real-word datasets to demonstrate that our coreset constructions are effective in approximating the loss function and considerably reduce the running time for robust regression problems while maintaining a good approximation of the objective function. We conduct the experiment on two real-world datasets from the UCI Machine Learning Repository: Appliances Energy Prediction (referred to as Energy) and Gas Turbine Emission (Emission).
Researcher Affiliation	Academia	1School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 2College of Computing and Data Sciences, Nanyang Technological University, Singapore 3Department of Computer Science, Carnegie Mellon University, USA. Correspondence to: Yi Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 Uniform(A, ε, m) ... Algorithm 2 Refine(D, ε, m) ... Algorithm 3 Coreset(A, ε, m) ... Algorithm 4 Robust Regression(A, b, ε, m) ... Algorithm 5 Robust PCA(A, ε, m)
Open Source Code	No	The text does not contain any explicit statement about releasing the source code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using a third-party heuristic approach, 'Fast LTS', but not the authors' own implementation.
Open Datasets	Yes	We conduct the experiment on two real-world datasets from the UCI Machine Learning Repository: Appliances Energy Prediction1 (referred to as Energy) and Gas Turbine Emission2 (Emission). 1https://archive.ics.uci.edu/dataset/374/ appliances+energy+prediction 2https://archive.ics.uci.edu/dataset/551/ gas_turbine_co_and_nox_emission_data_set
Dataset Splits	No	The paper uses real-world datasets and mentions running 1000 trials and drawing samples for coreset verification. However, it does not provide specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits) for the main experimental evaluation.
Hardware Specification	Yes	All experiments were run on a machine with an Intel i51165G7 @ 2.80GHz CPU and 16 GB memory using Python version 3.12.8.
Software Dependencies	Yes	All experiments were run on a machine with an Intel i51165G7 @ 2.80GHz CPU and 16 GB memory using Python version 3.12.8.
Experiment Setup	Yes	We verify that Algorithm 3 produces an effective coreset for subspace embedding with p = 2. Fixing parameters ε and m, we indepedently run Algorithm 3 1000 times. ... We perform 1000 trials with different coresets, solving the regression problem using Fast LTS for each coreset, yielding solutions xj for j = 1, . . . , 1000. ... Table 1: Runtimes (in seconds) for robust regression on our coresets and the whole dataset, with m = 10, for the Energy dataset (top) and the Emission dataset (bottom).