reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adversarial Inputs for Linear Algebra Backends

Authors: Jonas Möller, Lukas Pirch, Felix Weissberg, Sebastian Baunsgaard, Thorsten Eisenhofer, Konrad Rieck

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We begin our empirical evaluation by investigating the existence of Chimera examples in practice. Our goal is to assess whether we can construct corresponding feasible inputs for a learning model and sufficiently amplify their effect to cause conflicting predictions.
Researcher Affiliation	Academia	1Berlin Institute for the Foundations of Learning and Data (BIFOLD), Germany 2TU Berlin, Germany.
Pseudocode	Yes	The resulting method is described in Algorithm 1. We search for an input xk S that satisfies the Chimera conditions (Definition 3.1). The loop terminates when a Chimera is found or the maximum iterations N = 3000 is reached. Note that we express the calculation of the aggregated perturbation as a for-loop, as it depends on an architecture capable of simultaneously obtaining results from multiple backend instances, such as virtual machines or containers.
Open Source Code	Yes	To facilitate future work, we have uploaded our source code to https://github.com/mlsec-group/dila
Open Datasets	Yes	We consider three datasets, FMNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009).
Dataset Splits	Yes	FMNIST (Xiao et al., 2017) ... It consists of 60,000 grayscale 28 28 images of fashion items for training and 10,000 for testing. ... CIFAR-10 (Krizhevsky et al., 2009) is a benchmark dataset consisting of color images of size 32 32 pixels, with 50,000 images for training and 10,000 for testing.
Hardware Specification	Yes	P1 an Intel Xeon Gold 6326 CPU @ 2.90GHz, 16 cores, and 24 MB L3 cache (Ice Lake), P2 an Intel Xeon Silver 4114 CPU @ 2.20GHz with an Nvidia RTX 3090 24GB GPU P3 a Macbook Air M2, running mac OS Sonoma 14.6.1
Software Dependencies	Yes	For all of our experiments, we execute the same code on top of Py Torch v2.5.1 with the different BLAS backends. ... We use CUDA 12.4.
Experiment Setup	Yes	We use 32-bit floats with 23-bit significand in all experiments. ... All libraries use the default number of threads, as would be employed in a practical scenario. ... we default to inference batches of size one. ... For FMNIST, we use a fully connected network with two layers. ... For CIFAR, we employ a convolutional neural network with three VGG blocks (Simonyan & Zisserman, 2015) and three dense layers. We train both models to achieve a test accuracy of 82.32 % and 80.75 %, respectively. For Image Net, we use a pre-trained Efficient Net V2S (Tan & Le, 2021) with a test accuracy of 84.2 %. Refer to Appendix B for more details. ... (Appendix B): After training for 10 epochs, the model achieves an accuracy of 80.75% on the test set.