Published On Jun 16, 2020
Keynote presented on June 19th, 2020 at CVPR in the
Large Scale Holistic Video Understanding Tutorial
Slides: http://www.cvlibs.net/talks/talk_cvpr...
Paper: https://arxiv.org/abs/2006.07034
Abstract: Perceiving the world in terms of objects is a crucial prerequisite for reasoning and scene understanding. Recently, several methods have been proposed for unsupervised learning of object-centric representations. However, because these models have been evaluated with respect to different downstream tasks, it remains unclear how they compare in terms of basic perceptual abilities such as detection, figure-ground segmentation and tracking of individual objects. In this talk, I will argue that the established evaluation protocol of multi-object tracking tests precisely these perceptual qualities and propose a benchmark dataset based on procedurally generated video sequences. Using this benchmark, I will compare the perceptual abilities of three state-of-the-art unsupervised object-centric learning approaches: OP3, TBA and a video extension of MONet.