reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Certified Robustness Under Bounded Levenshtein Distance

Authors: Elias Abad Rocamora, Grigorios Chrysos, Volkan Cevher

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments in the AG-News, SST-2, Fake-News and IMDB datasets show non-trivial certificates at distances 1 and 2, taking 4 to 7 orders of magnitudes less time to verify. Furthermore, our method is the only one able to verify under Levenshtein distance larger than 1.
Researcher Affiliation	Academia	Elias Abad Rocamora , Grigorios G. Chrysos , Volkan Cevher : LIONS École Polytechnique Fédérale de Lausanne, Switzerland : Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA EMAIL, EMAIL
Pseudocode	No	The paper describes mathematical derivations and experimental procedures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation is available in github.com/LIONS-EPFL/Lips Lev. We will make the code publicly available upon the publication of this work, our implementation is attached with this submission.
Open Datasets	Yes	We train and verify our models in the sentence classification datasets AG-News (Gulli, 2005; Zhang et al., 2015), SST-2 (Wang et al., 2019), IMDB (Maas et al., 2011) and Fake-News (Lifferth, 2018).
Dataset Splits	Yes	For every experiment, we report the average results over three random seeds and report the performance over the first 1, 000 samples of the test set. Due to the extreme time costs of the brute-force and IBP approaches in the Fake-News dataset, we reduce to 50 samples in this dataset. We measure the final Lipschitz constant of each model and their clean and verified accuracies in a validation set of 1, 000 samples left out from the training set.
Hardware Specification	Yes	All of our experiments are conducted in a single machine with an NVIDIA A100 SXM4 40 GB GPU.
Software Dependencies	No	The paper mentions using the SGD optimizer, but does not provide specific version numbers for any software libraries, programming languages, or other dependencies.
Experiment Setup	Yes	For all our models and datasets, following Huang et al. (2019), we train models with a single convolutional layer, an embedding size of 150, a hidden size of 100 and a kernel size of 5 for the SST-2 dataset and 10 for the rest of datasets. Following the setup used in Andriushchenko and Flammarion (2020) for adversarial training, we use the SGD optimizer with batch size 128 and a 30-epoch cyclic learning rate scheduler with a maximum value of 100.0. We train single-layer models with a regularization parameter of λ {0, 0.001, 0.01, 0.1}, where λ = 0 is equivalent to standard training. We initialize the weights of each layer so that their Lipschitz constant is 1. We use a learning rate of 0.01.