reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Open-Source Conversational AI with SpeechBrain 1.0

Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Ha Nguyen, Xuechen Liu, Sangeet Sagar, Jarod Duret, Salima Mdhaffar, Gaëlle Laperrière, Mickael Rouvier, Renato De Mori, Yannick Estève

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents Speech Brain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. Speech Brain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.
Researcher Affiliation	Collaboration	1Concordia University, 2Mila-Quebec AI Institute, 3Avignon University, 4Samsung AI Center Cambridge, 5Universit e de Montr eal, 6University of Cambridge, 7Laval University, 8Zaion, 9Fondazione Bruno Kessler, 10University of Bologna, 11Telecom Paris, 12University of Edinburgh, 13Inria, 14Aalto University, 15University of Trento, 16Saarland University, 17National Institute of Informatics Tokyo, 18Silo AI, 19KU Leuven, 20Idiap, 21EPFL, 22Mc Gill University
Pseudocode	No	The paper describes methods and procedures in narrative text, but does not include any clearly labeled pseudocode blocks or algorithms formatted as such.
Open Source Code	Yes	Speech Brain1 is an open-source Conversational AI toolkit based on Py Torch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete recipes of code and algorithms required for training them. (...) 1. https://speechbrain.github.io/
Open Datasets	Yes	Moreover, 95% of our recipes utilize freely available data and include comprehensive training logs, checkpoints, and other essential information. (...) For the EEG modality, we rely on two key dependencies: MOABB7(Aristimunha et al., 2024) and MNE8(Gramfort et al., 2014).
Dataset Splits	No	First, users need to specify the data for training, validation, and testing using CSV or JSON ﬁles. These formats are supported because they allow ﬂexible and intuitive declaration of input ﬁles and annotations.
Hardware Specification	Yes	This enabled eﬃcient training of a 1-billionparameter SSL model for French on 14,000 hours of speech using over 100 A100 GPUs, showcasing the scalability of Speech Brain (Parcollet et al., 2024).
Software Dependencies	Yes	Speech Brain1 is an open-source Conversational AI toolkit based on Py Torch (...) Torchaudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for pytorch.
Experiment Setup	No	Next, users must design a model and deﬁne its hyperparameters using a modiﬁed YAML format known as Hyper Py YAML. This format facilitates complex yet elegant parameter conﬁgurations, deﬁning objects and their associated arguments. (...) The improvement primarily originated from a more robust data augmentation strategy and a more careful selection of the training hyperparameters.