12/2/15. G Perception. Bayesian Decision Theory. Laurence T. Maloney. Perceptual Tasks. Testing hypotheses. Estimation

Size: px

Start display at page:

Download "12/2/15. G Perception. Bayesian Decision Theory. Laurence T. Maloney. Perceptual Tasks. Testing hypotheses. Estimation"

Morris Murphy
5 years ago
Views:

1 G Perception Bayesian Decision Theory Laurence T. Maloney Perceptual Tasks Testing hypotheses signal detection theory psychometric function Estimation previous lecture Selection of actions this lecture Criteria Estimation Maximize information Selection of actions Maximize reward (value) 1

Girschick 1954 The Three Elements of SDT

2 What is the action that maximizes your expected reward (value)? Statistical decision theory is a model of action selection Statistical Decision Theory Abraham Wald John von Neumann David Blackwell Oskar Morgenstern M. A. Girschick 1954 The Three Elements of SDT W A X { } { } { } = w 1,w 2,...,w m = a 1,a 2,...,a p = x 1,x 2...,x n possible states of the world possible actions possible sensory events 2

3 Bayesian Decision Theory (BDT) gain G( a, w ) Consequence W P(w) prior likelihood p( x w ) Perception A decision d ( x ) Action X Fig. 1 Goal: select d : X A Origin of SDT: WW2 radar operator Are the blobs enemy aircraft? Or just noise (e.g. clouds)? Decision has consequences: If you miss an aircraft, people might get killed If you mistake noise for aircraft, fuel, manpower & resources are wasted Radar screen 3

4 Decision outcomes SIGNAL: are the blobs real enemy aircraft? S S DECISION: should you alert the air force? Y N Hit Miss False alarm Correct reject SDT p X S p[ X S] probability noise aircraft X Blob size (stimulus intensity) SDT S S probability noise aircraft X X Blob size (stimulus intensity) 4

5 How should we set criterion? W A X = = { S, S} { Y, N} (, ) = π ( S), π ( S) [ ] prior p X S, p X S likelihood Y N S V V YS NS S VYS V NS gain How should we set criterion? ( ) = YS [ ] YS ( ) = NS [ ] + ( ) ( ) EBG Y V p S X V p S X EBG N V p S X V p S X NS Y EBG Y > EBG N [ ] ps X VYS + V > p S X V + V YS NS NS How should we set criterion? Y posterior odds [ ] ps X VYS + V > p S X V + V YS NS NS Y likelihood ratio prior odds [ ] π ( S) p X S VYS + V NS > p X S π ( S) V + V YS NS 5

6 How should we set criterion? Y Y [ ] π ( S) p X S VYS + V NS log > log + log p X S π ( S) V + V YS NS p[ X S] log > log β p X S log likelihood ratio Compare the LLR to a criterion, log β How should we set criterion? Y log p X S p X S > logβ log likelihood ratio Compare the LLR to a criterion, log β log p X S p X S > logβ p X S p X S > β S S probability p X S p X S noise aircraft X Blob size (stimulus intensity) 6

7 Suppose we vary prior odds but set the value term to 0. ( ) ( ) Y log p X S > log π S p X S π S Y p X S > π ( S ) p X S π ( S) + log V +V YS NS V YS +V NS Y p X S p X S S > π S π S S ( ) ( ) probability p X S p X S noise aircraft X Blob size (stimulus intensity) Y p X S p X S S ( ) ( ) > π S π S S We can fit data to estimate β We can compare to optimal β probability p X S p X S noise aircraft X Blob size (stimulus intensity) 7

8 Is the brain Bayesian? 4 Vary prior odds Estimate criterion b R 2 : logβˆ 0 1:1 9:1-2 1:9 γ : 0.36 p 0 : logβ Tanner, Swets, & Green (1956) Conservatism 4 R 2 : logβˆ 0 1:1 9:1-2 1:9 γ : 0.36 p 0 : logβ Tanner, Swets, & Green (1956) 8

reasoning. We can test SDT/BDT as a framework for modeling perception and action (Maloney & Mamassian, 2009) Aside The gain function is the organism s link to the environment.

9 Themes Managing uncertainty to maximize gain is the central task of a biological organism. The use of explicit cost and rewards allow us to probe a much wider range of behavior than previously explored (Trommershäuser et al, 2008) Movement planning is counterfactual reasoning. We can test SDT/BDT as a framework for modeling perception and action (Maloney & Mamassian, 2009) Aside The gain function is the organism s link to the environment. It represents a problem, posed by the environment, a problem that can rapidly change. A G( a, w ) Consequence gain W decision d ( x ) likelihood L ( w x ) X Only the luckiest organism can choose its gain function. Bayesian Decision Theory and the Planning of Movements L 9

10 Experimental Task Start of trial: display of fixation cross (1.5 s) L Experimental Task Display of response area, 500 ms before target onset (114.2 mm x 80.6 mm) L 10

11 Experimental Task Target display (700 ms) L Experimental Task L Experimental Task 100 L The green target is hit: +100 points

12 Experimental Task L -500 Experimental Task L The red target is hit: -500 points -500 Experimental Task L 12

13 Experimental Task L Scores add if both targets are hit: Experimental Task L Experimental Task You are too slow: -700 The screen is hit later than 700 ms after target display: -700 points. If you are on time but Miss the targets, 0. 13

14 Experimental Task Current score: 500 End of trial Choice among Movement Strategies mm 0 0 What should Paulina do? y hit -y mean (mm) Distribution of movement end points x hit -x mean (mm) Subject S4, = 3.62 mm, 72x15 = 1080 end points Expected Normal Value Q-Q Plot x hit -x mean (mm) y hit -y mean (mm) Observed Value 14

variance Harris & Wolpert (1998) Choice among Movement Strategies 0 0 0

15 If there were no red penalty circle. Aim for center Select perceptual-motor strategies that minimize variance Harris & Wolpert (1998) Choice among Movement Strategies mm 0 0 What should Paulina do? Thought Experiment : -500 : 100 points (2.5 ) 100 points = 4.83 mm 15

16 Thought Experiment : -500 : 100 points (2.5 ) 100 points 100 points 200 points = 4.83 mm Thought Experiment : -500 : 100 points (2.5 ) 100 points 100 points 100 points 300 points = 4.83 mm Thought Experiment : -500 : 100 points (2.5 ) 100 points 100 points 100 points -400 points -100 points = 4.83 mm 16

3 pts. per trial = 4.83 mm 5 ) - 0.3 pts. per trial 30.

7 pts. per trial 25.5 pts. per trial = 4.83 mm 17

17 Thought Experiment : -500 : 100 points (2.5 ) 100 points 100 points 100 points -400 points pts. per trial = 4.83 mm Thought Experiment : -500 : 100 points (2.5 ) pts. per trial 30.7 pts. per trial = 4.83 mm Thought Experiment : -500 : 100 points (2.5 ) pts. per trial 30.7 pts. per trial 25.5 pts. per trial = 4.83 mm 17

Thought Experiment : -500 : 100 points (2.5 ) - 0.3 pts.

(x,y): 10 5-0 -5-10 -10-5 0 5 10 15 20 = 4.

-500 points per trial Thought Experiment penalty: 0 penalty:

18 Thought Experiment : -500 : 100 points (2.5 ) pts. per trial 30.7 pts. per trial 25.5 pts. per trial 22.6 pts. per trial = 4.83 mm Expected value as function of mean movement end point (x,y): = 4.83 mm points per trial <-30 target: 100 penalty: -500 points per trial Thought Experiment penalty: 0 penalty: 100 penalty: 500 y x y x y x <-60 x, y: mean movement end point [mm] y [mm] y [mm] y [mm] = 4.83 mm x [mm] x [mm] x [mm] 18

19 Movement plans as lotteries (p 1, -500; p 2, -400; p 3, 100; p 4, 0) Movement plans as lotteries = 4 mm Lottery: (1.3%, -500; 30.3%, -400; 60.9%, 100; 7.5%, 0) Movement plans as lotteries Optimal aim point: lottery with MEV = 4 mm (6.6%, -500; 52.3%, -400; 37.0%, 100; 4.0%, 0) (1.3%, -500; 30.3%, -400; 60.9%, 100; 7.5%, 0) ( 0%, -500; 4.6%, -400; 62.6%, 100; 32.8%, 0) ( 0%, -500; 0.7%, -400; 37.6%, 100; 61.7%, 0) 19

Experiment 1 Reaching with Asymmetric Gain/Loss Julia Trommershäuser Trommershäuser, Maloney, Landy (2003) JOSA A Test of the model: Experiment 1 4 stimulus configurations: (varied within

20 Experiment 1 Reaching with Asymmetric Gain/Loss Julia Trommershäuser Trommershäuser, Maloney, Landy (2003) JOSA A Test of the model: Experiment 1 4 stimulus configurations: (varied within block) 2 penalty conditions: 0 and -500 points (varied between blocks) 5 practiced movers 1 session of data collection: 360 trials 24 data points per condition R = 9 mm General Methods: Training For all experiments: All subjects practice the task for 360 trials or more until their variance stabilizes. The timeout limit is gradually decreased to 700 ms during training. There are no penalties during training (the concept is never mentioned). We verify that each subject s movement variance has stabilized. They are told only to (0) +100 make money. 20

21 General Methods: Training For all experiments: All subjects practice the task for 360 trials or more until their variance stabilizes. The timeout limit is gradually decreased to 700 ms during training. There are no penalties during training (the concept is never mentioned). We verify that each subject s movement variance has stabilized. They are told only to (0) +100 make money. Model prediction: Results: Experiment 1 model, penalty = 0 Subject S5, = 2.99 mm Results: Experiment 1 Model prediction: configuration 1 model, penalty = 0 x model, penalty = 500 Subject S5, = 2.99 mm 21

22 Results: Experiment 1 Model prediction: configuration 2 model, penalty = 0 x model, penalty = 500 Subject S5, = 2.99 mm Results: Experiment 1 Model prediction: configuration 3 model, penalty = 0 x model, penalty = 500 Subject S5, = 2.99 mm Results: Experiment 1 Model prediction: configuration 4 model, penalty = 0 x model, penalty = 500 Subject S5, = 2.99 mm 22

23 Results: Experiment 1 Comparison with experiment exp., penalty = 0 exp., penalty = 500 x model, penalty = 500 Subject S5, = 2.99 mm Results: Experiment 1 S1 S2 S3 S4 S5 exp., penalty = 0 exp., penalty = 500 x model, penalty = 500 Results: Experiment 1 S % S % S % S % S % exp., penalty = 0 exp., penalty = 500 x model, penalty =

24 Two tasks 1. Movement task: Aim for a point on the screen Extensive training 2. Decision task: Select which point to aim for No explicit training Learning? Hill-Climbing? mean end point offset (mm) Hypothetical Trend trial number 24

25 Conclusions The use of explicit cost and rewards allow us to probe a much wider range of behavior than previously explored. Managing uncertainty to maximize gain is the central task of a biological organism. 25

Statistical Decision Theory II: Costs, Rewards, & Ideal Actors

Statistical Decision Theory II: Costs, Rewards, & Ideal Actors Loss Functions Statistical decision theory involves determining which decisions (or actions) are best It formalizes good versus bad outcomes