openXBOW -- Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit
Authors: Maximilian Schmitt, Björn Schuller
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The capabilities of the tool have been exemplified in different scenarios: sentiment analysis in tweets, classification of snore sounds, and time-dependent emotion recognition based on acoustic, linguistic, and visual information, where improved results over other feature representations were observed. [...] Results are presented in terms of the concordance correlation coefficient (CCC) between the prediction and the annotation. For the acoustic domain, the 65 LLDs from the Com Par E feature set have been extracted using the toolkit open SMILE (Eyben et al., 2013). [...] The results reported in Table 1 show that, both early and late fusion by training another SVM based on the predictions from single modalities are suitable for the prediction of emotional speaker states. |
| Researcher Affiliation | Academia | Maximilian Schmitt EMAIL Bj orn Schuller EMAIL Chair of Complex and Intelligent Systems, University of Passau, 94032 Passau, Germany |
| Pseudocode | No | The paper includes Figure 1 titled 'Overview of the basic workflow of open XBOW', which is a diagram describing the process, but it does not contain any structured pseudocode or algorithm blocks with step-by-step instructions in a code-like format. |
| Open Source Code | Yes | open XBOW is implemented in Java and can thus be used on any common platform. It has been published on Git Hub as a public repository1, including both the source code and a compiled jar file for users who do not have a Java Development Kit installed. The software and the source code are published under GPLv3. [...] 1. https://github.com/open XBOW/open XBOW |
| Open Datasets | Yes | Emotion recognition in speech has been conducted on the SEWA2 (Automatic Sentiment Analysis in the Wild) corpus, more specifically, on video chat recordings of 64 German subjects. [...] 2. http://sewaproject.eu/ |
| Dataset Splits | Yes | The overall length of this audio-visual data is approximately 89 minutes, the data was split into subject-independent training, development (devel), and test partitions (34/14/16 subjects). |
| Hardware Specification | Yes | On a system with an Intel Core i7-4770 (3.4 GHz) CPU, 16 GB RAM, Windows 10 operating system, and Java Version 8, Update 121, the computation of the aforementioned crossmodal Bo W (with early fusion) took 263 s for training and 67 s for prediction. |
| Software Dependencies | No | The paper mentions 'Java Version 8, Update 121' for the system running the software. However, it does not provide specific version numbers for other ancillary software components or libraries like LIBSVM, LIBLINEAR, or open SMILE that are referenced as being used in the experiments. |
| Experiment Setup | Yes | The codebook size for numeric features is 1 000, the number of assignments is 10. The dictionary consists of 346 words. [...] As the target of the prediction is time-dependent, the input was segmented into overlapping blocks of 8 s width and 0.1 s hop size, as described by Schmitt et al. (2016b). For decoding of the Bo W, LIBLINEAR was used; the complexity parameter was optimised on the development set. |