Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task

Authors: Jannik Kossen, Cătălina Cangea, Eszter Vértes, Andrew Jaegle, Viorica Patraucean, Ira Ktena, Nenad Tomasev, Danielle Belgrave

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our agents are able to solve a novel synthetic scenario requiring practically relevant cross-modal reasoning skills. On two large-scale, real-world datasets, Kinetics-700 and Audio Set, our agents successfully learn cost-reactive acquisition behavior. However, an ablation reveals they are unable to learn adaptive acquisition strategies, emphasizing the difficulty of the task even for state-of-the-art models.
Researcher Affiliation Collaboration Jannik Kossen1 Cătălina Cangea2 Eszter Vértes2 Andrew Jaegle2 Viorica Patraucean2 Ira Ktena2 Nenad Tomasev2 Danielle Belgrave2 1OATML, Department of Computer Science, University of Oxford 2Google Deep Mind
Pseudocode Yes Algorithm 1 A2MT
Open Source Code No The paper does not contain an explicit statement about the release of source code or a link to a code repository for the methodology described.
Open Datasets Yes Further, we propose to study A2MT on audio-visual datasets, concretely Audio Set (Gemmeke et al., 2017) and Kinetics-700 2020 (Smaira et al., 2020). These provide a challenging testbed for A2MT and avoid some of the complications of working with medical data.
Dataset Splits Yes We split the training set of each dataset into a subset used for model pretraining and a subset used exclusively for agent training, taking up 80% and 20% of the original training set respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software components like Perceiver IO and ADAM optimizer but does not specify their version numbers or other library dependencies needed to replicate the experiment.
Experiment Setup Yes We train using a batch size of 256. We use the ADAM optimizer with initial learning rate of 3 10 4, weight decay of 1 10 6, and a cosine annealing schedule. For the Perceiver IO encoder we use a single cross-attend block with 4 self-attention operations per Perceiver IO block; we use 128 queries, and the hidden dimension is 128. For the Perceiver IO decoder, we use a single head with 128 queries and hidden dim of 128. We train for a total of 2 105 steps. We set the discount factor to γ = 1.