reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On The Landscape of Spoken Language Models: A Comprehensive Survey

Authors: Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-yi Lee, Karen Livescu, Shinji Watanabe

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions for future work.
Researcher Affiliation	Academia	1 Carnegie Mellon University, USA 2 National Taiwan University, Taiwan 3 Toyota Technological Institute at Chicago, USA 4 Hebrew University of Jerusalem, Israel 5 ENS PSL, EHESS, CNRS, France
Pseudocode	No	The paper describes methods and architectures using textual explanations and diagrams (e.g., Figure 2: Overview of SLM architecture, Figure 3: A general pipeline for speech encoders, Figure 4: Hierarchical generation strategies). It does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper is a survey and does not describe a new methodology that would typically involve source code. While it mentions the availability of open-source models and toolkits related to the surveyed SLMs, it does not provide concrete access to source code for its own methodology.
Open Datasets	No	The paper is a comprehensive survey of spoken language models. It discusses various datasets used by the surveyed models for training and evaluation (e.g., 'LLa MA-Questions', 'Web Questions (Berant et al., 2013)', 'Trivia QA (Joshi et al., 2017)', 'Dynamic-SUPERB (Huang et al., 2024)'). However, the authors of this survey paper do not present or provide access to their own dataset for empirical studies.
Dataset Splits	No	As a survey paper, it does not conduct its own experiments or introduce new datasets, and therefore does not provide training/test/validation dataset splits.
Hardware Specification	No	The paper is a survey and does not describe experimental work performed by its authors. Therefore, it does not specify the hardware used to run experiments. While it lists hardware specifications for other models (e.g., 'A40', 'L40' in Table 3) during a latency comparison, this refers to the hardware used by the surveyed works, not by the authors of this paper.
Software Dependencies	No	This paper is a survey and does not present experimental results requiring specific software dependencies for its own methodology. It mentions various software and models in the context of the surveyed literature (e.g., 'BERT (Devlin et al., 2019)', 'GPT-2', 'LLa MA (Touvron et al., 2023a)'), but does not list specific versioned software dependencies for its own work.
Experiment Setup	No	The paper is a comprehensive survey and does not involve original experimental work by its authors. Consequently, it does not provide details regarding hyperparameters or system-level training settings for an experimental setup.