reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges

Authors: Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, reproducibility, reliability, and trustworthiness of LMaa S. We systematically examine the issues that arise from a lack of information about language models for each of these four aspects. We conduct a detailed analysis of existing solutions, put forth a number of recommendations, and highlight directions for future advancements. On the other hand, it serves as a synthesized overview of the licences and capabilities of the most popular LMaa S.
Researcher Affiliation	Academia	Emanuele La Malfa EMAIL Department of Computer Science, University of Oxford Oxford, OX1 3QG, UK The Alan Turing Institute London, NW1 2DB, UK Aleksandar Petrov EMAIL Department of Engineering, University of Oxford Oxford, OX1 3PJ, UK Simon Frieder EMAIL Department of Computer Science, University of Oxford Oxford, OX1 3QG, UK Faculty of Informatics, Vienna University of Technology Vienna 1040, Austria Christoph Weinhuber EMAIL Department of Computer Science, University of Oxford Oxford, OX1 3QG, UK Ryan Burnell EMAIL The Alan Turing Institute London, NW1 2DB, UK Raza Nazar EMAIL Faculty of Law, University of Oxford Oxford, OX1 3UL, UK Anthony G. Cohn EMAIL School of Computing, University of Leeds Leeds, LS2 9JT, UK The Alan Turing Institute London, NW1 2DB, UK Nigel Shadbolt EMAIL Michael Wooldridge EMAIL Department of Computer Science, University of Oxford Oxford, OX1 3QG, UK The Alan Turing Institute London, NW1 2DB, UK
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It is a review and analysis paper.
Open Source Code	No	The paper discusses open-source language models and their licensing (e.g., BLOOM, Alpaca, LLaMA) but does not provide any specific links or statements about making the code for its own methodology publicly available.
Open Datasets	No	The paper discusses challenges related to datasets (e.g., dataset contamination, test beds, benchmarks like BIG-Bench, IMDB Dataset, SST-2 Dataset, Large Movie Review Dataset as examples in Figure 4) in the context of Language Models as a Service, but it does not utilize or provide access to a specific dataset for its own research.
Dataset Splits	No	The paper is a conceptual overview and does not conduct experiments on specific datasets, therefore no dataset split information is provided.
Hardware Specification	No	The paper does not describe any specific hardware used for its own experiments. It mentions hardware in the context of general LM training and inference (e.g., 'dedicated data centres and supercomputers', 'single GPU with 28GB of RAM'), but not for its own methodology.
Software Dependencies	No	The paper discusses various software and models (e.g., Transformer-based LMs, Huggingface, Allen NLP) in a general context but does not list specific software dependencies with version numbers for its own work.
Experiment Setup	No	The paper is an overview and analysis of the Language Models-as-a-Service paradigm and its challenges. It does not describe an experimental setup, hyperparameters, or training configurations for its own research.