Benchmarking Unsupervised Object Representations for Video Sequences
Authors: Marissa A. Weis, Kashyap Chitta, Yash Sharma, Wieland Brendel, Matthias Bethge, Andreas Geiger, Alexander S. Ecker
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To close this gap, we design a benchmark with four data sets of varying complexity and seven additional test sets featuring challenging tracking scenarios relevant for natural videos. Using this benchmark, we compare the perceptual abilities of four object-centric approaches... Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking than the spatial transformer based architectures. We also observe that none of the methods are able to gracefully handle the most challenging tracking scenarios despite their synthetic nature, suggesting that our benchmark may provide fruitful guidance towards learning more robust object-centric video representations. |
| Researcher Affiliation | Academia | 1Institute of Computer Science, University of Göttingen, Germany 2Campus Institute Data Science, Göttingen, Germany 3Department of Computer Science, University of Tübingen, Germany 4Institute for Theoretical Physics, University of Tübingen, Germany 5Bernstein Center for Computational Neuroscience, Tübingen, Germany 6Max Planck Institute for Intelligent Systems, Tübingen, Germany 7Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany |
| Pseudocode | No | The paper describes the methods (MONet, Vi MON, TBA, IODINE, OP3, SCALOR) using prose and mathematical equations in Section C 'Methods' and its subsections. It does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code, data, as well as a public leaderboard of results is available at https:// eckerlab.org/code/weis2021/. |
| Open Datasets | Yes | Our code, data, as well as a public leaderboard of results is available at https:// eckerlab.org/code/weis2021/. ... Data sets are available at this URL. |
| Dataset Splits | Yes | The training set consists of 10,000 examples whereas the validation set as well as the test set contain 1,000 examples each. ... We generate a training set consisting of 9600 examples, validation set of 384 samples and test set of 1,000 examples ... The training set consists of 10,000 sequences whereas the validation set and the test set contain 1,000 sequences each. |
| Hardware Specification | Yes | Runtime analysis (using a single RTX 2080 Ti GPU). |
| Software Dependencies | No | MONet and Vi MON are implemented in Py Torch (Paszke et al., 2019)... k-Means algorithm as implemented by sklearn (Pedregosa et al., 2011). The paper mentions software such as PyTorch and sklearn, but does not provide specific version numbers for these or other key libraries. |
| Experiment Setup | Yes | MONet and Vi MON are implemented in Py Torch (Paszke et al., 2019) and trained with the Adam optimizer (Kingma and Ba, 2015) with a batch size of 64 for MONet and 32 for Vi MON, using an initial learning rate of 0.0001. ... MONet is trained with β = 0.5 and γ = 1 and Vi MON is trained with β = 1 and γ = 2. K = 5 for Sp MOT, K = 6 for VMDS and K = 8 for VOR. ... We train SCALOR with a batch size of 16 for 300 epochs using a learning rate of 0.0001 for Sp MOT and VOR and for 400 epochs for VMDS. |