reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Certified Robustness to Data Poisoning in Gradient-Based Training

Authors: Philip Sosnin, Mark Niklas Mueller, Maximilian Baader, Calvin Tsay, Matthew Robert Wicker

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive empirical evaluation demonstrating the effectiveness of our approach. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving. In this section we experimentally validate the effectiveness of our proposed approach. We provide complete details of hyper-parameters and run-times of our experiments in Appendix G. For classification tasks, we report the certified accuracy, which is our certified lower bound on the accuracy of any model poisoned with the given attack; likewise, certified mean squared error refers to an upper bound on the loss in regression tasks.
Researcher Affiliation	Collaboration	Philip Sosnin EMAIL Department of Computing, Imperial College London, United Kingdom Mark N. Müller EMAIL Department of Computer Science, ETH Zurich, Switzerland Logic Star.ai, Switzerland Maximilian Baader EMAIL Department of Computer Science, ETH Zurich, Switzerland Calvin Tsay EMAIL Department of Computing, Imperial College London, United Kingdom Matthew Wicker EMAIL Department of Computing, Imperial College London, United Kingdom The Alan Turing Institute, United Kingdom
Pseudocode	Yes	Algorithm 1 Abstract Gradient Training for Computing Valid Parameter-Space Bounds
Open Source Code	Yes	A code repository to reproduce our experiments can be found at: https://github.com/psosnin/Abstract Gradient Training.
Open Datasets	Yes	UCI-houseelectric dataset from the UCI repository (Hebrail & Berard, 2012)... retinal OCT dataset (OCTMNIST) (Yang et al., 2021)... Udacity self-driving car dataset4 github.com/udacity/self-driving-car/tree/master
Dataset Splits	No	The paper mentions data usage in batches (e.g., "a mix of 50% Drusen samples (b = 6000 with 3000 Drusen) per batch") and that a "test set" is used, but does not provide explicit overall training, validation, and test split percentages or sample counts for the datasets used to allow for reproduction.
Hardware Specification	Yes	All experiments were run on a server equipped with 2x AMD EPYC 9334 CPUs and 2x NVIDIA L40 GPUs using an implementation of Algorithm 1 written in Python using Pytorch.
Software Dependencies	No	The paper states that the implementation uses "Python using Pytorch" but does not provide specific version numbers for either Python or PyTorch.
Experiment Setup	Yes	Figure 2 (bottom) shows the progression of bounds on the MSE (computed for the test set) over the training procedure for a fixed poisoning attack (n = 100, ϵ = 0.01) and various hyperparameters of the regression model. Where not stated, p = q = , d = 1, w = 50, b = 10000, α = 0.02. Table 2: Datasets and Hyperparameter Settings [includes] #Epochs, α learning rate, η decay rate and b batch size.