reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Authors: Shayne Longpre, Stella Biderman, Alon Albalak, Hailey Schoelkopf, Daniel McDuff, Sayash Kapoor, Kevin Klyman, Kyle Lo, Gabriel Ilharco, Nay San, Maribeth Rauh, Aviya Skowron, Bertie Vidgen, Laura Weidinger, Arvind Narayanan, Victor Sanh, David Ifeoluwa Adelani, Percy Liang, Rishi Bommasani, Peter Henderson, Sasha Luccioni, Yacine Jernite, Luca Soldaini

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed for capabilities to be assessed in context.
Researcher Affiliation	Collaboration	Shayne Longpre, MIT Stella Biderman, Eleuther AI Alon Albalak, UC Santa Barbara, Synth Labs Hailey Schoelkopf, Eleuther AI Daniel Mc Duff, University of Washington Sayash Kapoor, Princeton University Kevin Klyman, Stanford University, Harvard University Kyle Lo, Allen Institute for AI Gabriel Ilharco, University of Washington Nay San, Stanford University Maribeth Rauh, Google Deep Mind Aviya Skowron, Eleuther AI Bertie Vidgen, ML Commons, Contextual AI Laura Weidinger, Google Deep Mind Arvind Narayanan, Princeton University Victor Sanh, Hugging Face David Adelani, University College London, Masakhane Percy Liang, Stanford University Rishi Bommasani, Stanford University Peter Henderson, Princeton University Sasha Luccioni, Hugging Face Yacine Jernite, Hugging Face Luca Soldaini, Allen Institute for AI
Pseudocode	No	The paper is a survey and review of tools and resources for foundation model development. It does not present new algorithms or include pseudocode for its own methodology.
Open Source Code	Yes	We release the Foundation Model Development Cheatsheet, the repository of annotated tools for text, speech, and vision models, and open it for public contributions.
Open Datasets	Yes	In the text domain, web scrapes from common crawl (commoncrawl.org), or OSCAR (https: //oscar-project.org/) (Suárez et al., 2019; Laippala et al., 2022) are the base ingredient for most pretraining corpora.
Dataset Splits	No	The paper is a review of tools and resources for foundation model development and does not conduct its own experiments with specific training, validation, and test dataset splits.
Hardware Specification	No	The paper is a review and survey of existing tools and resources and does not describe specific hardware used for its own research or analysis.
Software Dependencies	No	The paper reviews various software tools and frameworks but does not specify the software dependencies with version numbers used for its own analysis or methodology.
Experiment Setup	No	The paper is a review of tools and resources and does not present an experimental setup with specific hyperparameters or training configurations for its own work.