Predictive Processing in Planning:

Size: px

Start display at page:

Download "Predictive Processing in Planning:"

Vernon Williamson
5 years ago
Views:

for Human Neuroimaging, UCL The Promise of Predictive

1 Predictive Processing in Planning: Choice Behavior as Active Bayesian Inference Philipp Schwartenbeck Wellcome Trust Centre for Human Neuroimaging, UCL The Promise of Predictive Processing: A Critical Evaluation of its Prospects Tufts University 2018

2 Inferring the latent structure of the world v. Helmholtz, 1867 Active inference: to make sense of the world, we need to infer its latent structure (hidden causes) This is not only relevant for perception, but also for adaptive behavior!

. o t O Central idea: Action fulfils prior expectations = preferences o F = ln P(o t ) D[Q(s t ) P(s t o

3 What is active inference? world s a t A agent sƹ Bayesian Brain, Good Regulator Theorem (Conant & Ashby, 1947).. o t O Central idea: Action fulfils prior expectations = preferences o F = ln P(o t ) D[Q(s t ) P(s t o t )] Qs ( ) Evidence Divergence Free energy Taken from Friston, 2017 minimise surprise predictive processing Action Changing sensations to maximise evidence Perception Changing beliefs to minimise divergence

4 Outline 1. Active Inference: basic building blocks i. Markov Decision Processes ii. Generative models iii. Information theory iv. Variational Bayes/free energy world s a t A o t O agent sƹ 2. Active inference based on a discrete Markovian generative model 3. Some applications and predictions

5 Active inference: basic building blocks

6 I. Markov Decision Processes Theoretical framework for decision-making: partially observable Markov decision processes (POMDP) Key ingredients: 1,, T discrete time-steps Hidden states s t 1 s s t t 1

7 I. Markov Decision Processes Theoretical framework for decision-making: partially observable Markov decision processes (POMDP) Key ingredients: 1,, T discrete time-steps s t vs. P o t s t ) not trivial o 1 o 2 o3

8 I. Markov Decision Processes Theoretical framework for decision-making: partially observable Markov decision processes (POMDP) Key ingredients: 1,, T discrete time-steps a t 1 s t+1 P o t s t ) not trivial a t 2 a t 1 Hidden states s t 2 t 1 s t+1 s s t a t a t 2 3 s t+1 Markov-property: P s t+1 s t, a t = P s t+1 s t, a t, s t 1, a t 1, s t 2, a t 2,

9 II. Generative models and hidden states Generative models are probabilistic mappings based on a likelihood function and a prior density: p y, θ m = p y θ, m p θ, m [ p o t, s t = p o t s t p s t ] Allows to generate data by sampling from prior and plugging into likelihood Call θ/s t hidden state or latent variable. Perform inference on hidden states by applying Bayes rule ( model inversion ): p θ y, m = p y θ, m p θ m p(y m) Active inference emphasizes that agents build generative models on the world, and use them to perform inference on perception and action.

III. Information theory Information theory quantifies the information content of a signal Unlikely events are more informative than likely

called (Shannon) entropy: H y = E I y Expected amount of information of an event (important later!

10 III. Information theory Information theory quantifies the information content of a signal Unlikely events are more informative than likely events This can be quantified as the self-information or surprise of a signal (Shannon, 1948): I y = logp(y m) Expected value of surprise is called (Shannon) entropy: H y = E I y Expected amount of information of an event (important later!) Also: measure of uncertainty = p y i m log p(y i m) i Active inference is heavily grounded in information theory, and argues that it has a lot to say about how brains work and how we behave.

11 IV. Variational Bayes Exact inference is generally intractable p θ y, m = p y θ, m p(θ m) p(y m) = p y θ, m p(θ m) p y θ, m p(θ m) Variational Bayes allows to cast an inference problem (difficult) as a bound optimization problem (easier) (cf., Beal, 2003; Bogacz 2017) Aim: approximate model evidence (denominator in Bayes) (Negative) variational free energy provides a lower bound on model evidence General formula: F = log p y m D KL q θ, p θ y, m Accuracy Complexity Note: negative log-model evidence is the same as information-theoretic surprise log P y m! log p y m = F D KL q θ, p θ y, m

12 Active inference: building blocks I. Choice behavior is modeled as partially observable Markov Decision Process Highlights importance of inferring hidden states and adequate choices II. III. Inference on hidden states and choices is based on an agent s generative model of the world Invert models via Bayesian inference Information theory allows to quantify how informative/expected an observation is Surprise = negative (log-) model evidence! IV. Inference is approximated based on (negative) variational free energy Approximates model evidence (surprise!)

13 Active Inference: model structure

14 ǁ ǁ A Markovian generative model p y, θ = p y θ p θ P o, s, ǁ π, γ = P o s P s π P(π γ)p(γ) P o s ǁ = P o 0 s 0 P o 1 s 1 P o t s t P o t s t =: A Likelihood P sǁ π = P s t s t 1, π P s 1 s 0, π P(s 0 ) P s t+1 s t, π =: B(u = π(t)) P s 0 P o t =: D =: σ(c) Empirical priors hidden states d Precision C P(π γ) = σ( γ Q) control states Policy Q = E[ln P o t s t ln Q o t π + ln P(o t )] t D B P A = Dir(a) P B = Dir(b) P D = Dir(d) P γ = Γ(1, β) Full priors Hidden states s t 1 st st 1 A Q s, ǁ π, A, B, D, γ = Q s 1 π Q s T π Q π Q A Q B Q D Q(γ) Q s t π = Cat(s t ) Q(π) = Cat(π) ot 1 o t ot 1 Q(A) = Dir(a) Q(B) = Dir(b) Q(D) = Dir(d) Q(γ) = Γ(1, β) Approximate posterior

15 A-matrix: observation model P o, s, ǁ π, γ = P o sǁ P sǁ π P(π γ)p(γ) Hidden states s t 1 st st 1 P o s ǁ = P o 0 s 0 P o 1 s 1 P o t s t A P o t s t =: A Likelihood o o t 1 t o t 1 P

16 B-Matrix: Markovian transition probabilities P o, s, ǁ π, γ = P o sǁ P sǁ π P(π γ)p(γ) Policy P sǁ π = P s t s t 1, π P s 1 s 0, π P(s 0 ) P s t+1 s t, π =: B(u = π(t)) P s 0 P o t =: D =: σ(c) Empirical priors hidden states Hidden states s t 1 st st 1 B P, gamble

17 Prior expectations over outcomes (c-vector) P o, s, ǁ π, γ = P o sǁ P sǁ π P(π γ)p(γ) P sǁ π = P s t s t 1, π P s 1 s 0, π P(s 0 ) P s t+1 s t, π =: B(u = π(t)) P s 0 P o t =: D =: σ(c) Empirical priors hidden states d D Policy C Hidden states s t 1 s s t t 1

18 Precision γ P o, s, ǁ π, γ = P o sǁ P sǁ π P(π γ)p(γ) P(π γ) = σ( γ Q) control states Q = E [H[P o t s t ]] + H Q o t π + E[lnP(o t )] t P A = Dir(a) P D = Dir(d) P B = Dir(b) P γ = Γ(1, β) Full priors Precision Policy gamble vs. don t gamble?

19 Precision and Uncertainty Precision encoded by neuromodulators? ACh? NA? DA? Schwartenbeck et al., 2015 Parr & Friston, 2017

20 Inference Belief updating Maths is not trivial but only has to be done once! (By mathematicians, evolution, ) Then implement resulting update equations. Perception Policy selection Precision π sƹ t = σ(log A o t + log B t 1 sƹ t 1 ) π = σ( γ Q) መβ = β π Q s F π F γ F Q Derive via variational free energy with respect to hidden states x = {s t, π, β}: Q x = arg min Q(x) F(o, x, Q) P(x o) F o, x, Q = E Q st,π,β ln P o, s t, π, β m E Q st,π,β [ln Q s t, π, β ] π = s t (ln sƹ t ln A o t ln B t 1 sƹ t 1 ) + cf., Friston et al., 2015, 2017; Bogacz, 2017

21 Inference: a closer (more intuitive) look Belief updating Perception π sƹ t = σ(log A o t + log B t 1 sƹ t 1 ) Likely hidden states given what I observe Based on what I believe about the previous state Policy selection π = σ( γ Q) Confidence Value of policies Q Precision መβ = β π Q Prior on precision Value of inferred policy cf., Friston et al., 2015, 2017

22 Beyond (active) inference: learning Belief updating Perception Policy selection Precision sƹ t = σ(log A o t + log B π sƹ t 1 ) π = σ( γ Q) መβ = β π Q???? Q A = ψ a ψ(a 0 ) a = a + o t D = ψ d ψ(d 0 ) d = d + s t Learning cf., Friston et al., 2016

log-expectations over outcomes: P(o t ) log P(o t ) Fulfilling these preferences maximizes model

23 Policy selection as predictive processing Policies become valuable if they Maximize reward/utility Keep options open Allow us to minimize uncertainty about the world Utilities or preferences are defined as log-expectations over outcomes: P(o t ) log P(o t ) Fulfilling these preferences maximizes model evidence log P o t (i.e., minimizes surprise log P(o t ))! This can be approximated with variational free energy!

24 Policy selection as predictive processing Value of policies Q(π) defined as negative free energy: Q s, ǁ π, A, B, D, γ = Q s 1 π Q s T π Q π Q A Q B Q D Q(γ) Approximate posterior Q(π) = E[H[P o t s t ]] D KL [Q o t π P(o t )] t Friston et al., 2015 Q(π) = E [H[P o t s t ]] + H Q o t π + E[lnP(o t )] t Expected uncertainty (H = 0 o t = s t ) Entropy over outcomes Expected utility Epistemic or intrinsic value Extrinsic value Friston et al., 2015; 2017; Parr & Friston, 2017

25 Some Applications and Predictions

Inference: a closer look on policy selection Imagine you are a Tennis player and your opponent serves You want to find a position where Expected utility: E[lnP(o t )].

26 Inference: a closer look on policy selection Imagine you are a Tennis player and your opponent serves You want to find a position where Expected utility: E[lnP(o t )]...there is a high chance of getting the ball, Entropy over outcomes: H Q o t π... you have a good chance of getting a ball that was unexpected,...and where you can minimize uncertainty about the character of your opponent. Ambiguity: E[H P o t s t ]

27 Policy selection: expected utility vs. entropy over outcomes Expected Utility E lnp o t = o t π U t Entropy over outcomes Expected Utility H Q o t π + E lnp o t = o t π U t o t π log o t π Schwartenbeck et al., 2015

28 Policy selection: foraging Friston,, FitzGerald & Pezzulo, 2015 foraging trials Schwartenbeck & Friston, 2016

29 Policy selection: curiosity and exploration P( o, s, ǁ A, π, γ) Q π = E Q ln P o t E Q [D KL [P(s t o t, π) Q s t π + D KL [P(A o t, π) Q A π ] cf., Parr & Friston, 2017 Schwartenbeck et al. in prep Expected update Exploration? {, }? Exploitation

p(risky) p(no reward risky option) p(risky) p(no reward risky option) p(risky) Policy selection: curiosity and exploration Active learning: Goal-directed uncertainty reduction in beginning

30 p(risky) p(no reward risky option) p(risky) p(no reward risky option) p(risky) Policy selection: curiosity and exploration Active learning: Goal-directed uncertainty reduction in beginning of experiment p(reward risky option) trials Random learning: Random uncertainty reduction p(reward risky option) trials No learning: No uncertainty reduction trials Schwartenbeck et al. in prep

State estimation: Planning as inference s π t = σ(log A o t + log B π sƹ π t 1 ) sƹ t = State estimation as Bayesian model averaging of likely states under different policies: i π i s t π p s t = p s

31 State estimation: Planning as inference s π t = σ(log A o t + log B π sƹ π t 1 ) sƹ t = State estimation as Bayesian model averaging of likely states under different policies: i π i s t π p s t = p s t π i p(π i ) i This predicts an optimism bias for estimates of current (and future states) This accounts for trade-offs and hybrid models of different modes of behaviour, e.g. model-free vs. model-based behaviour Predicts accuracy-complexity trade-off: FitzGerald, Dolan & Friston, 2014

32 Building models of the world world s a t A agent sƹ o t O???? Q FitzGerald, Dolan & Friston, 2014

33 Summary Active inference offers probabilistic treatment of planning and decision-making Highlights importance of developing a (generative) model to perform inference Introduce predictive processing to decision-making: Choices fulfil preferences minimise surprise maximise model evidence Approximated by variational free energy Approximate inference takes place based on variational updates of the current state, policy and confidence Applies the same currency to information-gain and reward, such that agents Maximize utility Keep options open Minimize uncertainty ( goal-directed exploration)

34 Thank you!

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples