reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Blending adversarial training and representation-conditional purification via aggregation improves adversarial robustness

Authors: Emanuele Ballarin, Alessio ansuini, Luca Bortolussi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluation by a well-established benchmark of strong adaptive attacks, across different image datasets, shows that Carso is able to defend itself against adaptive end-to-end white-box attacks devised for stochastic defences. With a modest clean accuracy penalty, our method improves by a significant margin the state-of-the-art for Cifar-10, Cifar-100, and Tiny Image Net-200 ℓ robust classification accuracy against Auto Attack.
Researcher Affiliation	Academia	Emanuele Ballarin EMAIL AIlab University of Trieste; Alessio Ansuini EMAIL Data Engineering Laboratory AREA Science Park; Luca Bortolussi EMAIL AIlab University of Trieste. All authors are affiliated with public research institutions (University of Trieste and AREA Science Park, which is a national public research institution in Italy).
Pseudocode	No	The paper describes the method and architecture in detail through text and diagrams (Figure 1), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	Implementation of the method and code for the experiments (based on Py Torch (Paszke et al., 2019), Adver Torch (Ding et al., 2019), and ebtorch (Ballarin, 2025)) can be found at: https://github.com/emaballarin/CARSO.
Open Datasets	Yes	Data The Cifar-10 (Krizhevsky, 2009) dataset is used in scenario (a), the Cifar-100 (Krizhevsky, 2009) dataset is used in scenario (b), whereas the Tiny Image Net-200 (Chrabaszcz et al., 2017) dataset is used in scenario (c).
Dataset Splits	Yes	The Cifar-10 (Krizhevsky, 2009) dataset is used in scenario (a), the Cifar-100 (Krizhevsky, 2009) dataset is used in scenario (b), whereas the Tiny Image Net-200 (Chrabaszcz et al., 2017) dataset is used in scenario (c). These are well-established benchmark datasets that come with predefined, standard training and testing splits.
Hardware Specification	Yes	All experiments were performed on an NVIDIA DGX A100 system. Training in scenarios (a) and (c) was run on 8 NVIDIA A100 GPUs with 40 GB of dedicated memory each; in scenario (b) 4 of such devices were used.
Software Dependencies	No	Implementation of the method and code for the experiments (based on Py Torch (Paszke et al., 2019), Adver Torch (Ding et al., 2019), and ebtorch (Ballarin, 2025)). While software names are provided with corresponding publication years, explicit version numbers like 'PyTorch 1.9' are not given.
Experiment Setup	Yes	The purifier is trained on the VAE loss, using summed pixel-wise channel-wise binary cross-entropy as the reconstruction cost. Optimisation is performed by RAdam+Lookahead (Liu et al., 2020; Zhang et al., 2019b) with a learning rate schedule that presents a linear warm-up, a plateau phase, and a linear annealing (Smith, 2017). ... The initial and final epochs of such modulation are reported in Table 16. Table 16 provides detailed hyperparameters for training, including optimiser settings (RAdam β1, β2, ϵ, Weight Decay, Lookahead steps), initial and final learning rates, LR schedule epochs (warm-up, plateau, annealing), batch size, and adversarial batch fraction.