Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
Authors: Jannik Kossen, Cătălina Cangea, Eszter Vértes, Andrew Jaegle, Viorica Patraucean, Ira Ktena, Nenad Tomasev, Danielle Belgrave
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our agents are able to solve a novel synthetic scenario requiring practically relevant cross-modal reasoning skills. On two large-scale, real-world datasets, Kinetics-700 and Audio Set, our agents successfully learn cost-reactive acquisition behavior. However, an ablation reveals they are unable to learn adaptive acquisition strategies, emphasizing the difficulty of the task even for state-of-the-art models. |
| Researcher Affiliation | Collaboration | Jannik Kossen1 Cătălina Cangea2 Eszter Vértes2 Andrew Jaegle2 Viorica Patraucean2 Ira Ktena2 Nenad Tomasev2 Danielle Belgrave2 1OATML, Department of Computer Science, University of Oxford 2Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 A2MT |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Further, we propose to study A2MT on audio-visual datasets, concretely Audio Set (Gemmeke et al., 2017) and Kinetics-700 2020 (Smaira et al., 2020). These provide a challenging testbed for A2MT and avoid some of the complications of working with medical data. |
| Dataset Splits | Yes | We split the training set of each dataset into a subset used for model pretraining and a subset used exclusively for agent training, taking up 80% and 20% of the original training set respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like Perceiver IO and ADAM optimizer but does not specify their version numbers or other library dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We train using a batch size of 256. We use the ADAM optimizer with initial learning rate of 3 10 4, weight decay of 1 10 6, and a cosine annealing schedule. For the Perceiver IO encoder we use a single cross-attend block with 4 self-attention operations per Perceiver IO block; we use 128 queries, and the hidden dimension is 128. For the Perceiver IO decoder, we use a single head with 128 queries and hidden dim of 128. We train for a total of 2 105 steps. We set the discount factor to γ = 1. |