Position: Machine Learning Models Have a Supply Chain Problem
Authors: Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented model signing in Python, on top of the sigstore-python library. Our implementation is 4500 lines of code and available as an open-source library. We benchmarked our hashing code, using SHA-256 and both the na ıve and list-based approaches, for file sizes ranging from 1 B to 1 TB, on three machines. We also summarize in Table 1 the costs associated with hashing and signing a variety of large open models (as obtained from Hugging Face). To benchmark the costs associated with this type of training data commitment, we use the available Rust code for the Parakeet verifiable registry (Malvai et al., 2023). We measured the costs of computing a commitment and proving and verifying against it for datasets ranging from 1000 to 10 billion data points. |
| Researcher Affiliation | Industry | 1Google 2Google Deep Mind. Correspondence to: Sarah Meiklejohn <EMAIL>. |
| Pseudocode | Yes | The algorithm for forming this type of commitment, ZKS.Commit, can be found in Figure 3. (Figure 3: Algorithms for our zero-knowledge set, assuming an underlying accumulator Acc and VRF VRF.) |
| Open Source Code | Yes | We have released this work as an open-source library and are working to integrate it into existing model hubs. Our implementation is 4500 lines of code and available as an open-source library.12 (footnote 12: https://github.com/sigstore/model-transparency/) |
| Open Datasets | Yes | smaller image models might be trained on well known datasets like CIFAR-10 (which has 50K rows)13 or MNIST (60K rows)14, while larger datasets like You Tube-Commons (400K rows)15 are used for finetuning language models for Q&A tasks. (Footnotes 13, 14, 15 point to Hugging Face datasets: https://huggingface.co/datasets/cifar10, https://huggingface.co/datasets/mnist, https://huggingface.co/datasets/Ple IAs/You Tube Commons) |
| Dataset Splits | No | The paper mentions datasets like CIFAR-10 (which has 50K rows), MNIST (60K rows), and You Tube-Commons (400K rows) in Section 6.5, but does not provide specific training/test/validation splits used for any experimental setup within this paper. |
| Hardware Specification | Yes | We benchmarked our hashing code... on three machines: (1) M1 with 24 v CPUs running on AMD EPYC 7B12 at 2.25 GHz and 96 GB of RAM; (2) M2 with 64 v CPUs running on AMD EPYC 7B13 at 2.45 GHz and with 120 GB of RAM; and (3) M3 with 128 v CPUs running on AMD EPYC 7B13 CPUs at 2.45 GHz and with 240 GB of RAM. |
| Software Dependencies | No | The paper mentions implementing model signing in Python on top of the sigstore-python library and using available Rust code for the Parakeet verifiable registry, but does not provide specific version numbers for any of these software components. |
| Experiment Setup | No | The paper details the setup for benchmarking its cryptographic signing and verification process, including hashing approaches ('na ıve' and 'list-based'), file sizes (1 B to 1 TB), chunk size (default 1 GB), and signature scheme (ECDSA P256). However, it does not specify machine learning hyperparameters or training configurations for models like learning rate, batch size, or epochs, as the core experiment is on the cryptographic transparency solution rather than ML model training itself. |