reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The 2023 Foundation Model Transparency Index

Authors: Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, Percy Liang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To assess the transparency of the foundation model ecosystem and help improve transparency over time, we introduce the Foundation Model Transparency Index. The 2023 Foundation Model Transparency Index specifies 100 fine-grained indicators that comprehensively codify transparency for foundation models... We score 10 major foundation model developers (e.g. Open AI, Google, Meta) against the 100 indicators to assess their transparency. ...We present 10 top-level findings about the foundation model ecosystem...
Researcher Affiliation	Academia	Rishi Bommasani* EMAIL Stanford University Kevin Klyman* EMAIL Stanford University Shayne Longpre EMAIL Massachusetts Institute of Technology Sayash Kapoor EMAIL Princeton University Nestor Maslej EMAIL Stanford University Betty Xiong EMAIL Stanford University Daniel Zhang EMAIL Stanford University Percy Liang EMAIL Stanford University
Pseudocode	No	The paper introduces a framework and indicators for evaluating transparency in foundation models but does not present any structured pseudocode or algorithm blocks for its methodology.
Open Source Code	Yes	To facilitate further research, and reproduce our scoring and analyses, we make all core materials (e.g. indicators, scores, justifications, visuals) publicly available.24 24https://www.github.com/stanford-crfm/fmti
Open Datasets	Yes	To facilitate further research, and reproduce our scoring and analyses, we make all core materials (e.g. indicators, scores, justifications, visuals) publicly available.24 24https://www.github.com/stanford-crfm/fmti
Dataset Splits	No	The paper focuses on evaluating the transparency of foundation model developers based on a set of defined indicators, rather than conducting machine learning experiments that would typically involve dataset splits for training, validation, and testing.
Hardware Specification	No	The paper discusses the hardware specifications (Compute indicators) that foundation model developers should disclose for their models (Section 4.1), but it does not specify any particular hardware used by the authors themselves for their own analysis or research presented in this paper.
Software Dependencies	No	The paper outlines a search protocol (Appendix C) that involves using Google search to find information, but it does not specify any particular software dependencies (e.g., programming languages, libraries, frameworks with version numbers) used by the authors for their own analysis or research presented in this paper.
Experiment Setup	Yes	To ensure our scoring is consistent, we identify information using a rigorous search protocol (see Appendix C). To ensure our scoring is accurate, we notified developers and provided them the opportunity to contest any scores prior to the release of this work (all 10 responded and 8 of the 10 explicitly contested some scores). ...Having identified the information basis for scoring an indicator, 2 researchers on the team independently scored the developer on the indicator. This entails specifying a score (i.e. 0 or 1), source used in arriving at that score (e.g. one or more webpages), and a textual justification for how the evidence from sources is weighed against the criteria for the indicator in determining the score. ...Overall, across all 100 × 10 (indicator, developer) pairs, the agreement rate was 85.2% (Cohen’s κ = 0.67, indicating substantial agreement; Landis & Koch, 1977). To resolve disagreements, the researchers discussed and jointly came to a resolution.