Augmented Language Models: a Survey
Authors: Grégoire Mialon, Roberto Dessi, Maria Lomeli, Christoforos Nalmpantis, Ramakanth Pasunuru, Roberta Raileanu, Baptiste Roziere, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. ... In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues. ... Table 1: Evaluation of different reasoning methods on GSM8K, a popular reasoning benchmark. FT denotes fine-tuning and Co T denotes chain-of-thought. The reported accuracies are based on [1]: (Wei et al., 2022c); [2]: (Cobbe et al., 2021); [3]: (Chowdhery et al., 2022); and [4]: (Gao et al., 2022). |
| Researcher Affiliation | Industry | Grégoire Mialon EMAIL Roberto Dessì EMAIL Maria Lomeli EMAIL Christoforos Nalmpantis EMAIL Ram Pasunuru EMAIL Roberta Raileanu EMAIL Baptiste Rozière EMAIL Timo Schick EMAIL Jane Dwivedi-Yu EMAIL Asli Celikyilmaz EMAIL Edouard Grave EMAIL Yann Le Cun EMAIL Thomas Scialom EMAIL Meta AI Universitat Pompeu Fabra |
| Pseudocode | No | The paper is a survey and does not present a new algorithm or method that would require pseudocode. Figures 4 and 6 show snippets of Python code as examples from other papers being reviewed, not as pseudocode for this survey's own methodology. |
| Open Source Code | No | This paper is a survey of existing works and does not describe a novel methodology requiring its own source code release. There is no statement about the release of code for this survey paper. |
| Open Datasets | Yes | Using few-shot Co T prompting, Minerva (Lewkowycz et al., 2022) achieves excellent performance on math benchmarks such as GSM8K (Cobbe et al., 2021). ... Wei et al. (2022b) demonstrate that LLMs become able to perform some BIG-bench tasks3 via few-shot prompting once a certain scale is attained. 3https://github.com/google/BIG-bench |
| Dataset Splits | No | The paper is a survey and describes various research methodologies, including few-shot and zero-shot settings, which relate to how models are used with data. However, it does not provide specific details on dataset splits (e.g., percentages, counts, or explicit standard splits) for any of the datasets discussed, as it is reviewing other works rather than conducting its own experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to conduct its own research or analysis. It mentions 'software and hardware innovations' in a general context but no specific models or configurations. |
| Software Dependencies | No | The paper is a survey and does not present an implementation that would require specific software dependencies with version numbers. While it mentions tools like 'python interpreter' and 'faiss' in the context of the reviewed works, it does not specify software dependencies for its own methodology. |
| Experiment Setup | No | The paper is a survey and discusses experimental setups and training procedures of various research papers it reviews, such as 'fine-tuning with behavior cloning' or 'RLHF'. However, it does not provide specific experimental setup details (e.g., hyperparameters, training configurations) for its own research or analysis, as it is a review paper. |