Tracking objects that change in appearance with phase synchrony

Authors: Sabine Muzellec, Drew Linsley, Alekh Ashok, Ennio Mingolla, Girik Malik, Rufin VanRullen, Thomas Serre

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we describe a novel deep learning circuit that can learn to precisely control attention to features separately from their location in the world through neural synchrony: the complex-valued recurrent neural network (CV-RNN). Next, we compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs), using Feature Tracker: a large-scale challenge that asks observers to track objects as their locations and appearances change in precisely controlled ways. While humans effortlessly solved Feature Tracker, state-of-the-art DNNs did not. In contrast, our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization as a neural substrate for tracking appearance-morphing objects as they move about. Through a series of behavioral and computational experiments using Feature Tracker, we discover the following: Humans are exceptionally accurate at tracking objects in the Feature Tracker challenge... On the other hand, DNNs struggle on Feature Tracker... The CV-RNN approaches human performance and decision-making on Feature Tracker.
Researcher Affiliation Academia Sabine Muzellec Cer Co, CNRS, Universite de Toulouse, France Carney Institute for Brain Science Brown University, USA EMAIL Linsley Carney Institute for Brain Science Department of Cognitive & Psychological Sciences Brown University, USA EMAIL K. Ashok Carney Institute for Brain Science Department of Cognitive & Psychological Sciences Brown University, USAEnnio Mingolla Northeastern University Boston, MA, USAGirik Malik Northeastern University Boston, MA, USA Rufin Van Rullen Cer Co, CNRS Universite de Toulouse FranceThomas Serre Carney Institute for Brain Science Department of Cognitive & Psychological Sciences Brown University, USA
Pseudocode No The paper describes methods using mathematical equations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release Feature Tracker data, code, and human psychophysics at https://github.com/S4b1n3/feature_tracker to help the field investigate this gap between human and machine vision.
Open Datasets Yes We release Feature Tracker data, code, and human psychophysics at https://github.com/S4b1n3/feature_tracker to help the field investigate this gap between human and machine vision.
Dataset Splits Yes We use an identical training pipeline for all the models. This pipeline includes a training set composed of 100,000 videos of 32 frames and 32x32 spatial resolution. Model performance was evaluated on a held-out set of 10,000 videos at the end of every epoch of training, and training was stopped early if accuracy on this set decreased for five straight epochs. We then took the weights of each model that performed best on this hold-out set and evaluated them on 10,000 videos from each condition depicted in Fig. 5.
Hardware Specification Yes All the experiments of this paper have been performed using Quadro RTX 6000 GPUs with 16 Gb memory.
Software Dependencies No The paper mentions using PyTorch and the Adam optimizer, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We employ the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 3e 04, a batch size of 64 during 200 epochs with a Binary Cross-Entropy loss. Model performance was evaluated on a held-out set of 10,000 videos at the end of every epoch of training, and training was stopped early if accuracy on this set decreased for five straight epochs.