What and where: A Bayesian inference theory of attention

Size: px

Start display at page:

Download "What and where: A Bayesian inference theory of attention"

Shana Osborne
5 years ago
Views:

1 What and where: A Bayesian inference theory of attention Sharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio CBCL, McGovern Institute for Brain Research, MIT

2 Preliminaries Outline Perception & Bayesian inference Background & motivation Theory Attention as inference Bayesian model Computational model Model properties Applications on real-world images Predicting human eye movements Improving object recognition

Perception as Bayesian inference Mumford and Lee, Hierarchical Bayesian Inference in the Visual Cortex, JOSA, 20(7), 2003 x IT Recurrent feed-forward/feedback loops integrate bottom up information

3 Perception as Bayesian inference Mumford and Lee, Hierarchical Bayesian Inference in the Visual Cortex, JOSA, 20(7), 2003 x IT Recurrent feed-forward/feedback loops integrate bottom up information with top down priors Bottom-up signals : Data dependent Top-down signals : Task dependent Top down signals provide context information and help to disambiguate bottom-up signals x V4 x V2 x V1 x 0

4 Bottom up vs. top-down Hegde J. and Felleman J., Reappraising the functional implications of the primate visual anatomical hierarchy, Neuroscientist, 13(5), 2007

5 Bottom up vs. top-down

6 Mathematical framework

7 Bayesian generative models X System () f ( data ) Y ( state ) X ( data ) Y Statistical learning view: Generative model view: Y = f(x), X-data, Y-class Y X-random, Y-random X~P(X Y), ( Y X )! P( X Y) P( Y) P Y X X

8 x IT x V Perception: bottom-up & topdown Recall, P( A,B) = P( A B) P( B) = P( B A) P( A) P( A B) P( B) P( B A) = P( A) P( A) = P( A,B)! For the given network, B ( x,x,x ) = P( x x ) P( x x ) P( x ) P P IT V 0 0 ( xv,x0 xit ) = P( x0 xv,xit ) P( xv xit ) = P( x x ) P( x x ) 0 V V V V IT IT IT x 0 Inference: P P ( ) ( x0 xv,xit ) P( xv xit ) xv x0,xit = P( x0 xit ) P( x x ) P( x x ) 0 V V IT Bottom-up Top-down

9 Belief propagation ( + π(e Prior e + ( + π(x)=p(x e e+ ( + )P(x e π(e + ( + π(y)=p(y e X Y ( π(x b(x)α λ(x) λ y ( x (x)=p(e - Y ( λ(y)p(y x ( y λ(y)=p(e - e - ( - λ(e Evidence

10 Biological plausibility George D. and Hawkins J., A hierarchical Bayesian model of invariant pattern recognition in the visual cortex, IJCNN, 2005

11 U X Y 1 Y 2 Y 3 ( ) ( ) ( ) ( ) x ë x ë x = ë x ë y y 1 y 3 2 ( ) ( ) ( )! x u x P x ë = u ë ( ) ( ) ( )! u u D u x P = x D ( ) ( ) ( ) ( ) ( ) x y P x ë x ë x D = c y D y y x ! Trees

12 U 1 X Y 1 Y 2 Y 3 ( ) ( ) ( ) ( )! x ë x ë x ë = x ë y y y ( ) ( ) ( ) ( )!! x, u u D u u x P x ë = u ë ( ) ( ) ( ) ( )! u u D u D u u x P = x D u ( ) ( ) ( ) ( ) ( ) x y P x ë x ë x D = c y D y y x ! U 2 Polytrees

13 Attention Background & motivation

Visual processing: what and Ventral stream Dorsal stream where Ventral ( what ) stream: Processes shape information Responsible for object recognition Progressive loss of location information Dorsal

14 Visual processing: what and Ventral stream Dorsal stream where Ventral ( what ) stream: Processes shape information Responsible for object recognition Progressive loss of location information Dorsal ( where ) stream: Processes location and motion information Progressive loss of form information Form and location is processed concurrently and (almost) independently of each other How does the brain combine form and location information?

15 Ventral stream: invariant recognition Psychophysics V4 IT Reynolds, Chelazzi & Desimone 99 Serre Oliva Poggio 2007 Zoccolan Kouh Poggio DiCarlo 2007 How does the brain recognize objects under clutter? Figures from Serre et al, Hung et al.

16 Parallel vs. serial processing Attention is needed to recognize objects under clutter

17 Filter theory (Broadbent) Biased competition (Desimone) Feature integration theory (Treisman) Guided search (Wolfe) Scanpath theory (Noton) Bayesian surprise (Itti) Bottleneck (Tsotsos) Computational Role Biology V1 V4 MT LIP FEF Attention Everybody knows what Contrast attention gain is -William James, 1907 Response gain Modulation under spatial attention Modulation under feature attention Pop-out Serial vs. Parallel Bottom-up vs. Top-down Effects

18 Bridging the gap Conceptual models (theories) Provide justifications not implementations Computational models Model behavior (eye-movements) Cannot model physiological effects Phenomenological models Model specific physiological effects Cannot provide theory Bridging the gap Phenomenological, predicts behavior, theory

19 A theoretical framework

20 Bayesian model Kersten & Yuille 04 Assumption: visual system selects and localizes objects, one at a time

21 Bayesian model Assumption: object location and identity are marginally independent of each other

22 Bayesian model Assumption: Every object is generated using a set of N complex features each of which may be present or absent

23 Bayesian model F i : Location/scale invariant features

24 Computational model

25 Relation to biology PFC LIP/FEF IT V4 V2 Feature-based Spatial attention: attention: What is Where at location is object L? O?

26 Model description Where What L F i x i Feature-maps N Feature-maps Image

27 Model properties: invariance Where What L F i x i N

28 Model properties: crowding Where What L F i x i N

29 Model: spatial attention L F i X F i l N * * * What is at location X?

30 Model: feature-based attention X L F i F i l * * * N Where is object X?

31 Model properties

32 Spatial Invariance

33 Spatial Attention

34 Feature Attention

35 Feature Popout

36 Parallel vs. Serial Search

37 Application I: Predicting eye-movements

38 Predicting eye movements Eye movements can be considered as a proxy for attention Cues influencing eye-movements Bottom-up image saliency Top-down feature biases Top-down spatial bias Evaluation

39 Model can predict human eye-movements Top-down Bottom-up spatialattention and feature attention Method Method ROC area (Cars) ROC area (Pedestrian) ROC area (absolute) Bruce and Tsotos 06 Itti et al % Torralba et al. Itti et al % Proposed Proposed 80.4% Humans 87.8% 72.8% 42.3% 72.7% 77.1% 77.9% 80.1% 87.4%

40 Application II: Improving recognition

41 Effect of clutter on detection recognition without attention recognition under attention

42 Recognition performance improves with attention Chikkerur, Serre, Tan & Poggio (in submission, Vision Research)

43 Recognition performance improves with attention Chikkerur, Serre, Tan & Poggio (in prep)

44 Theory Summary Attention is part of the inference process that solves the problem of what is where. Computational model We describe a computational model and relate it to functional anatomy of attention. Attentional phenomena (pop-out, multiplicative modulation, contrast response) are predicted by the model. Applications Predicting human eye movements. Improving object recognition

45 Thank you!

47 Relation to prior work

48 Feedback and its role Reconstruction Mental Imagery Murray J. F., Visual recognition, inference and coding using learned sparse overcomplete representations, PhD thesis, UCSD, 2005 Hinton G. E, Osindero S. and Teh. Y, A fast learning algorithm for deep belief nets, Neural computation, vol. 18, 2006

49 Feedback and its role Visual attention/segmentation Murray J. F., Visual recognition, inference and coding using learned sparse overcomplete representations, PhD thesis, UCSD, 2005 Fukushima K., A neural network model for selective attention in visual pattern recognition, Biological Cybernetics, vol. 55, 1986

50 Predicting physiological effects

51 Spatial attention McAdams and Maunsell 99 Model Unattended Attended

52 Feature-based attention Bichot and Desimone 05 Model

53 Contrast gain vs. Response gain Trujillo and Treue 02 Mc Adams and Maunsell 99

54 Attentional effects in MT: popout

55 MT: Feature based attention

56 MT: Multi-modal interaction

Deep learning in the visual cortex

Deep learning in the visual cortex Thomas Serre Brown University. Fundamentals of primate vision. Computational mechanisms of rapid recognition and feedforward processing. Beyond feedforward processing: