reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training Robust Ensembles Requires Rethinking Lipschitz Continuity

Authors: Ali Ebrahimpour Boroojeny, Hari Sundaram, Varun Chandrasekaran

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through various experiments, we show LOTOS increases the robust accuracy of ensembles of Res Net-18 models by 6 percentage points (p.p) against black-box attacks on CIFAR-10. It is also capable of combining with the robustness of prior state-of-the-art methods for training robust ensembles to enhance their robust accuracy by 10.7 p.p.
Researcher Affiliation	Academia	Ali Ebrahimpour Boroojeny UIUC EMAIL Hari Sundaram UIUC EMAIL Varun Chandrasekaran UIUC EMAIL
Pseudocode	No	The paper describes methods and equations but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is publicly available at https://github.com/Ali-E/LOTOS.
Open Datasets	Yes	Datasets and Models: In all the experiments for evaluating the efficacy of our model, either in isolation or in combination with prior methods, we use both CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009).
Dataset Splits	Yes	In all the experiments for evaluating the efficacy of our model, either in isolation or in combination with prior methods, we use both CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). For the black-box attacks, an independently trained source (surrogate) model (of the same type as the models in the ensemble) is used to generate the adversarial examples; we then measure the robust accuracy of the ensembles against these adversarial examples i.e., robust accuracy is the accuracy on the adversarial samples for which the model correctly predict the original versions.
Hardware Specification	Yes	Compute Infrastructure: We used NVIDIA A40 GPUs for our experiments except for the experiments in 5.5 that involved training with TRS method where we used NVIDIA A100 GPUs. Using 32GB of RAM was enough for performing our experiments. This work used Delta computing resources at National Center for Supercomputing Applications through allocation CIS240316 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program Boerner et al. (2023)
Software Dependencies	No	The paper does not provide specific version numbers for software libraries, programming languages, or other ancillary tools used in the experiments.
Experiment Setup	Yes	The attack performed on the source models is PGD-50 with ϵ 0.04, unless stated otherwise. We use both white-box attacks and black-box attacks in our experiments... We found the value of 0.8 to be a good trade-off between the two for increasing the robustness of ensembles and used that for our experiments. We try different layer-wise clipping values (0.8, 1.0, 1.2, and 1.5).