Position: Machine Learning Models Have a Supply Chain Problem

Authors: Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented model signing in Python, on top of the sigstore-python library. Our implementation is 4500 lines of code and available as an open-source library. We benchmarked our hashing code, using SHA-256 and both the na ıve and list-based approaches, for file sizes ranging from 1 B to 1 TB, on three machines. We also summarize in Table 1 the costs associated with hashing and signing a variety of large open models (as obtained from Hugging Face). To benchmark the costs associated with this type of training data commitment, we use the available Rust code for the Parakeet verifiable registry (Malvai et al., 2023). We measured the costs of computing a commitment and proving and verifying against it for datasets ranging from 1000 to 10 billion data points.
Researcher Affiliation Industry 1Google 2Google Deep Mind. Correspondence to: Sarah Meiklejohn <EMAIL>.
Pseudocode Yes The algorithm for forming this type of commitment, ZKS.Commit, can be found in Figure 3. (Figure 3: Algorithms for our zero-knowledge set, assuming an underlying accumulator Acc and VRF VRF.)
Open Source Code Yes We have released this work as an open-source library and are working to integrate it into existing model hubs. Our implementation is 4500 lines of code and available as an open-source library.12 (footnote 12: https://github.com/sigstore/model-transparency/)
Open Datasets Yes smaller image models might be trained on well known datasets like CIFAR-10 (which has 50K rows)13 or MNIST (60K rows)14, while larger datasets like You Tube-Commons (400K rows)15 are used for finetuning language models for Q&A tasks. (Footnotes 13, 14, 15 point to Hugging Face datasets: https://huggingface.co/datasets/cifar10, https://huggingface.co/datasets/mnist, https://huggingface.co/datasets/Ple IAs/You Tube Commons)
Dataset Splits No The paper mentions datasets like CIFAR-10 (which has 50K rows), MNIST (60K rows), and You Tube-Commons (400K rows) in Section 6.5, but does not provide specific training/test/validation splits used for any experimental setup within this paper.
Hardware Specification Yes We benchmarked our hashing code... on three machines: (1) M1 with 24 v CPUs running on AMD EPYC 7B12 at 2.25 GHz and 96 GB of RAM; (2) M2 with 64 v CPUs running on AMD EPYC 7B13 at 2.45 GHz and with 120 GB of RAM; and (3) M3 with 128 v CPUs running on AMD EPYC 7B13 CPUs at 2.45 GHz and with 240 GB of RAM.
Software Dependencies No The paper mentions implementing model signing in Python on top of the sigstore-python library and using available Rust code for the Parakeet verifiable registry, but does not provide specific version numbers for any of these software components.
Experiment Setup No The paper details the setup for benchmarking its cryptographic signing and verification process, including hashing approaches ('na ıve' and 'list-based'), file sizes (1 B to 1 TB), chunk size (default 1 GB), and signature scheme (ECDSA P256). However, it does not specify machine learning hyperparameters or training configurations for models like learning rate, batch size, or epochs, as the core experiment is on the cryptographic transparency solution rather than ML model training itself.