reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quadratic Upper Bound for Boosting Robustness

Authors: Euijin You, Hyang-Won Lee

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that applying QUB loss to the existing methods yields significant improvement of robustness. Furthermore, using various metrics, we demonstrate that this improvement is likely to result from the smoothened loss landscape of the resulting models. We conduct experiments to compare the performance of models trained with QUB loss to existing methods adopting the traditional AT loss in (4). We visualize the loss landscape on the CIFAR-10 dataset using the Res Net18 architecture.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Konkuk University, Seoul, South Korea. Correspondence to: Hyang-Won Lee <EMAIL>.
Pseudocode	Yes	Algorithm 1 AT with Static QUB Loss Algorithm 2 AT w/ Decreasing Weight on QUB Loss
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	To evaluate robustness in image classification, we use three datasets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Netzer et al., 2011).
Dataset Splits	No	The paper mentions using CIFAR-10, CIFAR-100, and Tiny Image Net, and refers to a "validation set" and "test dataset", but it does not specify exact split percentages, sample counts, or explicitly reference predefined splits with citations for reproducibility. It states "Following common settings in adversarial training" but does not elaborate on these settings for data splitting.
Hardware Specification	Yes	We use a single NVIDIA Ge Force RTX 4090 GPU with 24GB of memory.
Software Dependencies	No	The paper mentions software components implicitly through discussions of models (e.g., ResNet18) and optimizers (SGD) but does not provide specific version numbers for any programming languages, libraries, or frameworks used.
Experiment Setup	Yes	The optimizer is SGD (Ruder, 2016), with a learning rate of 0.1, weight decay of 5e-4, and momentum of 0.9. The batch size is set to 128. Training is conducted over 100 epochs, and we utilize a multistep learning rate scheduler that scales the learning rate by 0.1 at epochs 70 and 85. We set the attack budget ϵ to 8/255 and use a step size α of 2/255 for multi-step attacks such as PGD and TRADES.