Symbol Grounding Association in Multimodal Sequences with Missing Elements
Authors: Federico Raue, Andreas Dengel, Thomas M. Breuel, Marcus Liwicki
JAIR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed extension in the following scenarios: missing elements in one modality (visual or audio) and missing elements in both modalities (visual and sound). The performance of our extension reaches better results than the original model and similar results to individual LSTM trained in each modality. |
| Researcher Affiliation | Academia | EMAIL Computer Science Department, TU Kaiserslautern, Gottlieb-Daimler Str. 1, 67663 Kaiserslautern, Germany; EMAIL Smart Data and Knowledge Services, German Research Center for Artificial Intelligence (DFKI), Trippstadter Str. 122, 67663 Kaiserslautern, Germany; EMAIL Computer Science Department, TU Kaiserslautern, Gottlieb-Daimler Str, 1, 67663 Kaiserslautern, Germany; EMAIL Mind Garage, TU Kaiserslautern, Davenportpl. 11,6 7663 Kaiserslautern, Germany |
| Pseudocode | No | The paper describes methods using mathematical formulas and conceptual diagrams (e.g., Figure 1, Figure 4, Figure 5) but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We used a subset of 30 objects from COIL-100 (Nene, Nayar, & Murase, 1996) that is a standard dataset of 100 isolated objects. |
| Dataset Splits | Yes | We follow a 5-fold cross-validation scheme where for each run eleven subjects are selected for training and the remaining two subjects are used for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | In contrast, the audio component was converted to Mel-Frequency Cepstral Coefficient (MFCC) using HTK toolkit1. The HTK toolkit is mentioned, but no version number is provided. |
| Experiment Setup | Yes | The parameters of the visual LSTM were: 40 memory cells, learning rate 0.0001, and momentum 0.9. On the other hand, the audio LSTM had 100 memory cells, and the learning rate and momentum are the same as in the visual LSTM. Furthermore, the learning rate in the statistical constraint was set to 0.001. |