Attention-based computer vision

A fundamental aspect of human vision is that of attention, i.e. the process by which we focus the computational resources of our brain's visual system to specific regions of the visual field. Based on the nature of the task to solve, we can then ignore irrelevant visual information by intelligently exploring the visual field. Maliciously exploiting this aspect can even yield surprising results...

Yet few computer vision systems incorporate this notion. They instead tend to systematically explore all of the visual field, sometimes at a resolution low enough to make this exploration tractable. I'm hence interested in developing new vision systems that do rely attention mechanisms in order to allocate its computational power intelligently.

In Learning to combine foveal glimpses with a third-order Boltzmann machine, Geoffrey Hinton and I developed learning algorithms for a Boltzmann machine capable of integrating the result of several fixations generated by a simulated multi-resolution retina. This system can be used to classify images based on the information contained in a limited number of fixations.


When applied to the problem of recognizing facial expressions, this system is even able to learn to focus its attention on informative facial features, mainly the mouth and eyes, and ignore irrelevant features such as the nose or hair (see video above).

Following this work, in a collaboration with Loris Bazzani, Nando de Freitas, Vittorio Murino and Jo-Anne Ting, a similar Boltzmann machine was developed for an object recognition and tracking system, described in Learning Attentional Policies for Tracking and Recognition in Video with Deep Networks. Loris put together several videos of the system in action, demonstrating its ability to track faces and hockey players:


References


  • Learning Attentional Policies for Tracking and Recognition in Video with Deep Networks [pdf] [talk] [youtube]
    Loris Bazzani, Nando de Freitas, Hugo Larochelle, Vittorio Murino and Jo-Anne Ting,
    International Conference on Machine Learning proceedings, 2011

  • Learning to combine foveal glimpses with a third-order Boltzmann machine [pdf] [supp] [talk] [faces video]
    Hugo Larochelle and Geoffrey Hinton,
    Advances in Neural Information Processing Systems 23, 2010