A fundamental aspect of human vision is that of attention,
i.e. the process by which we focus the computational
resources of our brain's visual system to specific regions
of the visual field. Based on the nature of the task to
solve, we can then ignore irrelevant visual information by
intelligently exploring the visual field. Maliciously exploiting this
aspect can even yield
Yet few computer vision systems incorporate this
notion. They instead tend to systematically explore all
of the visual field, sometimes at a resolution low enough
to make this exploration tractable.
I'm hence interested in developing new vision systems that
do rely attention mechanisms in order to allocate its
computational power intelligently.
When applied to the problem of recognizing
facial expressions, this system is even able to
learn to focus its attention on informative facial
features, mainly the mouth and eyes, and ignore
irrelevant features such as the nose or hair (see video above).
Learning Attentional Policies for Tracking and Recognition in Video with Deep Networks [pdf] [talk] [youtube]
Loris Bazzani, Nando de Freitas, Hugo Larochelle, Vittorio Murino and Jo-Anne Ting, International Conference on Machine Learning proceedings, 2011
Learning to combine foveal glimpses with a third-order Boltzmann machine [pdf] [supp] [talk] [faces video]
Hugo Larochelle and Geoffrey Hinton, Advances in Neural Information Processing Systems 23, 2010