Modelling behavioural data

Size: px

Start display at page:

Download "Modelling behavioural data"

Scott Pierce Poole
5 years ago
Views:

1 Modelling behavioural data Quentin Huys MA PhD MBBS MBPsS Translational Neuromodeling Unit, ETH Zürich Psychiatrische Universitätsklinik Zürich

2 Outline An example task Why build models? What is a model Fitting models Validating & comparing models Model comparison issues in psychiatry

3 Example task Go Nogo Rewarded Avoids loss Guitart-Masip, Huys et al. Submitted

4 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip, Huys et al. Submitted

5 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip, Huys et al. Submitted

6 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Think of it as four separate two-armed bandit tasks Guitart-Masip, Huys et al. Submitted

7 Analysing behaviour Standard approach: Decide which feature of the data you care about Run descriptive statistical tests, e.g. ANOVA Many strengths Weakness Piecemeal, not holistic / global Descriptive, not generative No internal variables

8 Analysing behaviour Standard approach: Decide which feature of the data you care about Run descriptive statistical tests, e.g. ANOVA Probability correct Go to Go to Nogo to Nogo to Win Avoid Win Avoid Many strengths Weakness Piecemeal, not holistic / global Descriptive, not generative No internal variables

9 Models Holistic Aim to model the process by which the data came about in its entirety Generative They can be run on the task to generate data as if a subject had done the task Inference process Capture the inference process subjects have to make to perform the task. Do this in sufficient detail to replicate the data. Parameters replace test statistics their meaning is explicit in the model their contribution to the data is assessed in a holistic manner

10 A simple Rescorla-Wagner model Q values Q t (a t,s t )=Q t (a t,s t )+ (r t Q t (a t,s t )) a t s t action on trial t; can be either go or logo stimulus presented on trial t learning rate 5 Rewards 4 3 Key points: Value 2 Time!=.!=.5!=.!=. Q is the key part of the hypothesis formally states the learning process in quantitative detail formalizes internal quantities that are used in the task

11 Actions Q values Action probabilities: softmax of Q value Features: Q t (a t,s t )=Q t (a t,s t )+ (r t Q t (a t,s t )) p(a t s t,h t, ) = p(a t Q(a t,s t ), ) = p(a t s t ) / Q(a t,s t ) e Q(a t,s t ) P a e Q(a,s t ) apple p(a) apple links learning process and observations choices, RTs, or any other data link function in GLMs man other forms

12 Fitting models I Maximum likelihood (ML) parameters ˆ = argmax L( ) where the likelihood of all choices is: L( ) = log p({a t } T t= {s t } T t=, {r t } T t=, {z} ), = log p({a t } T t= {Q(s t,a t ; )} T t=, ) TY = log p(a t Q(s t,a t ; ), ) = TX t= t= log p(a t Q(s t,a t ; ), )

13 Fitting models II No closed form Use your favourite method gradients fminunc / fmincon... Gradients for RW model dl( ) d dq t (a t,s t ; ) d = d d = X t X log p(a t Q t (a t,s t ; ), ) t d d Q t(a t,s t ; ) = ( ) dq t (a t,s t ; ) d X a p(a Q t (a,s t ; ), ) d d Q t(a,s t ; ) +(r t Q t (a t,s t ; ))

14 Little tricks Transform your variables = e = log( ) = +e = log d log L( ) d Avoid over/underflow y(a) = Q(a) y m = max a p = e y(a) P y(a) = ey(a) ym P b ey(b) b ey(b) y m

15 ML characteristics 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate Run Daw 2 d 2 d i d j L( )

16 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate Run Daw 2 d 2 d i d j L( ) Hessian can be used to derive confidence intervals and identify poorly constrained estimates

17 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate Run Daw 2 d 2 d i d j L( ) Hessian can be used to derive confidence intervals and identify poorly constrained estimates ML can overfit... more later

18 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate L( = ) L( = ) Run Daw 2 d 2 d i d j L( ) Hessian can be used to derive confidence intervals and identify poorly constrained estimates ML can overfit... more later

ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.

19 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 beta and epsilon can trade off Estimate L( = ) L( = ) α Hessian can be used to derive confidence intervals and identify poorly constrained estimates ML can overfit... more later Run d 2 d i d j L( ).5 2 β Daw 2 Figure : Likelihood surface for simulated reinforcement learning data, as a function of ters. Lighter colors denote higher data likelihood. The maximum likelihood estimate is surrounded by an ellipse of one standard error (a region of about 9% confidence); the from which the data were generated are denoted by an x. and Dayan, 22). In fact, the Q-learning rule of Equation 2 can be seen as a simplified ca filter: the Bayesian model uses the same learning rule but has additional machinery tha learning rate parameter a on a trial-by-trial basis (Kakade and Dayan, 22; Behrens et al. 28). Data likelihood: Given the model described above, the probability of a whole datase sequence of choices c = c...t given the rewards r = r...t ) is just product of their probabilit 3, P(c t = L Q t (L), Q t (R)) t Note that the terms Q t in the softmax are determined (via equation 2) by the rewards r c...t on trials prior to t.

20 Priors Not so smooth Smooth Prior Probability.2. Prior Probability Parameter 5 5 Parameter

21 Priors Not so smooth Smooth Prior Probability.2. Prior Probability Parameter 5 5 Parameter

22 Priors Not so smooth Smooth Prior Probability.2. Prior Probability Parameter. 5 5 Parameter.8 Memory Switches 6 Dombrovski et al. 2

23 Maximum a posteriori estimate P( )=p( a...t )= p(a...t )p( ) d p( a...t )p( ) log P( )= T t= log p(a t ) + log p( )+const. log P( ) d = log L( ) d + dp( ) d If likelihood is strong, prior will have little effect mainly has influence on poorly constrained parameters if a parameter is strongly constrained to be outside the typical range of the prior, then it will win over the prior

24 Maximum a posteriori estimate $ )*+,-.+/!!$!%! "!! #!!! #"!! $!!! &'( 2 trials, stimulus, actions, learning rate =.5, beta=2 mbeta=, meps=-3, n=

25 But What prior parameters should I use?

26 ML characteristics: group data A i

27 ML characteristics: group data A i A. A. A.

28 ML characteristics: group data A i A. A. A. Fixed effect conflates within- and between- subject variability

29 ML characteristics: group data A i A. A. A. Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model

30 ML characteristics: group data i A i A. A. A. A i A. A. A. Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy

31 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean A. A. A. A.

32 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean p(a i µ, ) = Z A. A. A. A. d i p(a i i ) p( i µ, )

33 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean p(a i µ, ) = Z A. A. A. A. d i p(a i i ) p( i µ, ) {z }

34 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i µ, µ, µ, i i i A. A. A. A. A. A. A. A. A. A. A. A. Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean p(a i µ, ) = Z A. A. A. A. d i p(a i i ) p( i µ, ) {z }

35 Estimating the hyperparameters MAP log P( )=L( ) + log p( ) =p( ) Empirical Bayes: set them to ML estimate ˆ = argmax p(a ) +const. where we use all the actions by all the k subjects A = {a k...t } K k=

36 ML estimate of top-level parameters $ M )*+,-.+/!!$!%! "!! #!!! #"!! $!!! &'( Subject ˆ = argmax p(a ) K A T

37 Estimating the hyperparameters Effectively we now want to do gradient ascent on: d d p(a ) But this contains an integral over individual parameters: p(a ) = d p(a ) p( ) So we need to: ˆ = argmax p(a ) = argmax d p(a ) p( )

38 Expectation Maximisation Z log p(a ) = log d p(a, ) Z p(a, ) = log d q( ) q( ) Z p(a, ) d q( ) log q( ) Jensen s inequality k th Estep:q (k+) ( ) p( A, (k) ) Z k th Mstep: (k+) argmax d q( ) log p(a, ) There are other approaches Monte Carlo Analytical conjugate priors Variational Bayes Iterate between Estimating MAP parameters given prior parameters Estimating prior parameters from MAP parameters

39 EM with Laplace approximation E step: only need sufficient statistics to perform M step p( A, (k) ) N (m k, S k ) Approximate and hence: q (k+) ( ) p( A, (k) ) E step: q k ( ) = N (m k, S k ) m k argmax p(a k )p( (i) ) S k 2 p(a k )p( (i) ) 2 =m k matlab: [m,l,,,s]=fminunc( ) Just what we had before: MAP inference given some prior parameters

40 EM with Laplace approximation Next update the prior Prior mean = mean of MAP estimates M step: (i+) µ = K m k (i+) 2 = N k i (m k ) 2 + S k ( (i+) µ ) 2 Prior variance depends on inverse Hessian S and variance of MAP estimates Take uncertainty of estimates into account And now iterate until convergence

41 Hierarchical / random effects models Advantages Accurate group-level mean and variance Outliers due to weak likelihood are regularized Strong outliers are not Useful for model selection Disadvantages Individual estimates i depend on other data, i.e. on A j6=i and therefore need to be careful in interpreting these as summary statistics Error bars on group parameters (especially group variance) are difficult to obtain More involved; less transparent

g e Q(a,s) P a e Q(a,s) critical sanity check : reasonable link function?

42 Link functions Sigmoid - greedy p(a s) = p(a s) = e Q(a,s) P a e Q(a,s) c if a = argmaxa Q(a, s) c a else Q(a) P(a) P(a) P(a) 2 2 irreducible noise p(a s) = 2 g + g e Q(a,s) P a e Q(a,s) critical sanity check : reasonable link function? Empirical Probability(Go) other link functions for other observations Predicted Probability(Go)

43 Model comparison A fit by itself is not meaningful Generative test qualitative Comparisons vs random vs other model -> test specific hypotheses and isolate particular effects in a generative setting

44 Model fit: likelihood How well does the model do? choice probabilities: typically around for 2-way choice for -armed bandit example pseudo-r 2 : -L/R better than chance? Ep(correct) = e L(ˆ)/K/T Predictive probabilities Count log p(a )/K/T = e = K,T k,t= p(a k,t k ) KT E[N k (correct)] = E[p k (correct)]t p bin (E[N k (correct)] N k d, p =) < Predictive Probability

45 Generative test Model: probability(actions) simply draw from this distribution, and see what happens Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Critical sanity test: is the model meaningful? Caveat: overfitting

46 Overfitting Y X

47 Model comparison Data.5 Model Model 2 P(x) 5 5 x

48 Model comparison Averaged over its parameter settings, how well does the model fit the data? p(a M) = Model comparison: Bayes factors Z d p(a ) p( M) BF = p(m A) p(m A) = p(a M ) p(m ) p(a M 2 ) p(m 2 ) Problem: integral rarely solvable approximation: Laplace, sampling, variational...

49 Why integrals? The God Almighty test. Powerful model. Weak model

50 Why integrals? The God Almighty test. Powerful model. Weak model

51 Why integrals? The God Almighty test. Powerful model. Weak model N (p(x )+p(x 2 )+ ) These two factors fight it out Model complexity vs model fit

52 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X X

53 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X X

54 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X X

55 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X Just a Gaussian X dx f(x) f (x ) 2 2

56 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

57 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

58 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

59 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

60 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

61 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

62 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 ) ii / T ) 2 log( ) N 2 log(t ) Bayesian Information Criterion (BIC) N Akaike Information Criterion (AIC)

63 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 ) ii / T ) 2 log( ) N 2 log(t ) Bayesian Information Criterion (BIC) N Akaike Information Criterion (AIC) Model fit vs Model complexity

64 Group data Multiple subjects Multiple models do they use the same model? If not parameters are not comparable which model best accounts for all of them? Multiple groups difference in models? difference in parameters? 2 k possible model comparisons Multiple parameters 2 k possible correlations with any one psychometric measure

65 Group data - approaches Summary statistic Treat individual model comparison measure as summary statistics, do ANOVA or t-test Fixed effect analysis Subject data independent log p(a M) = X i Random effects analyses Hierarchical prior on group parameters p(a M) = = X i log p(a i M) Z log Hierarchical prior on models Z d Z d i p(a i i )p( i M i ) 2 d p(a ) p( ) p( M) p(a, M k,r ) =p(a M k ) p(m k r) p(r ) X BIC i i

66 Group-level likelihood Contains two integrals: subject parameters prior parameters M p(a M) = d p(a, M) d p( ) p( M) A T K

67 Evaluating p(a M) p(a M) = d p(a, M) d p( ) p( M) M A T K

68 Evaluating p(a M) Two integrals tricky p(a M) = d p(a, M) d p( ) p( M) M A T K

69 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Top level first: M A T K

70 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Top level first: p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) log p(a M) log p(a ML, M) M (2 ) N 2 log( )+N 2 log(2 ) A T K

71 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

72 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

73 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

74 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

75 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

76 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

77 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 ) just as before, top-level BIC

78 Approximating level Still leaves the first level: Approximate integral by sampling, e.g. importance sampling for few dimensions (<) log p(a ML, M) = log d p(a ) p( ML ) log B B b= p(a b ) b p( ML )

79 Group-level BIC log p(a M) = d p(a ) p( M) 2 BIC int = log ˆp(A ˆML ) 2 M log( A )

80 Example task Go Nogo Rewarded Avoids loss Guitart-Masip, Huys et al. Submitted

81 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip, Huys et al. Submitted

82 Model validation: generating data Go Nogo Rewarded Avoids loss Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

83 Model validation: generating data Go Nogo Rewarded Avoids loss p(go s t ) Q t (go s t )+bias(go) Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

84 Model validation: generating data Go Nogo Rewarded Avoids loss p(go s t ) Q t (go s t )+bias(go)+v t (s t ) V t (s t ) = V t (s t )+ (r t V t (s t )) Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

85 Model validation: generating data P(go) value of stimulus Go Nogo Rewarded Avoids loss p(go s t ) Q t (go s t )+bias(go)+v t (s t ) V t (s t ) = V t (s t )+ (r t V t (s t )) Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

86 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

87 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

88 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

89 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Note: same number of parameters Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

90 How does it do? LL BIC int Sep:βα Q Sep:βαQ 2βα2 GoOnly Q βα2 Q GoOnly βα GoOnly Q βα4 Q βα Q βαq βαq βα Q βα4 Q βα GoOnly Q βα2 GoOnly Q 2βα2 GoOnly Sep:βαQ Sep:βα Q Σ i BIC i LL Int

91 How does it do? LL BIC int Sep:βα Q Sep:βαQ 2βα2 GoOnly Q βα2 Q GoOnly βα GoOnly Q βα4 Q βα Q βαq βαq βα Q βα4 Q βα GoOnly Q βα2 GoOnly Q 2βα2 GoOnly Sep:βαQ Sep:βα Q Σ i BIC i LL Int Fitted by EM... too nice?

92 Top-level Laplacian approximation Estimating the top-level determinant using 2nd order finite differences d 2 dh 2 ij p(a ) = ˆML 2 p(a ˆML + e i ) 2p(A ˆML )+p(a ˆML e j ) the shifted likelihoods can be evaluated by shifting the samples.

Group level errors Prior Mean Group level precision 4 3 2 N=5 T=6 N= T=6 N= T=3 N=5 T=2 Beta N=2 T=2 N= T=2 N=5 T=2 2 4 Increasing Subject # N Increasing Trials T 4 3 2 N=5 T=6 N= T=6 Learning rate

93 Group level errors Prior Mean Group level precision N=5 T=6 N= T=6 N= T=3 N=5 T=2 Beta N=2 T=2 N= T=2 N=5 T=2 2 4 Increasing Subject # N Increasing Trials T N=5 T=6 N= T=6 Learning rate N= T=3 N=5 N=2 T=2 T=2 N= T=2 N=5 T=2 2 4 Prior Mean True and estimates ± 95% CI Prior Variance Group level precision N=5 T=6 N= T=6 N= T=3 N=2 N=5 T=2 T=2 N= T=2 N=5 T=2 2 4 Total # of observations (N x T) N=5 T=6 N= T=6 N= T=3 N= T=2 N=5 T=2 N=5 T=2 N=2 T=2 2 4 Total # of observations (N x T) Prior Variance True and estimates ± 95% CI Total # of observations (N x T)

94 Posterior distribution on models Generative model for models Stephan et al. 29

95 Bayesian model selection - equations Write down joint distribution of generative model Variational approximations lead to set of very simple update equations start with flat prior over model probabilities α = α : then update u i k = Z d i p(a i, i M k ) exp ( k ) X k!! k k,k + X i P ui k k ui k

96 Group Model selection Integrate out your parameters

97 Questions in psychiatry I: regression Parametric relationship with other variables do standard second level analyses can use Hessians to determine weights E step: q k ( ) = N (m k, S k ) m k argmax p(a k )p( (i) ) S k 2 p(a k )p( (i) ) 2 =m k better: compare two models Q Model : i p(a i i ) p( i µ, ) i.e. i N (µ, ) Q Model 2: i p(a i i ) p( i µ,c,, i ) i.e. i N (µ + c i, )

98 Regression Standard regression analysis: m i = Cr i + /2 i Including uncertainty about each subject s inferred parameters m i = Cr i +( /2 + S /2 i ) i Careful: Finite difference estimates S can be noisy! regularize...

99 Questions in psychiatry II: group differences Do groups differ in terms of parameter(s)? Cannot compare parameters across different models even very similar parameters can account for different effects Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) For models with k parameters, there are 2 k possible comparisons multiple comparisons? posterior over models (Stephan et al. 29)

100 Group differences in parameters Are two groups similar in parameter x? ANOVA: compare likelihood of two means to likelihood of one global mean. Take degrees of freedom into account. But: this tries to account for the parameters with one or two groups, not for the data Compare models with separate or joint parameter & prior: Model ε β, β2 Model 2 ε β

101 Questions in psychiatry III: Classification Who belongs to which of two groups? How many groups are there?

102 Model comparison again What is significant? BF = p(a M ) p(a M 2 ) p( < ) IOglO(BO) B Evidence against Ho oto /2 to 3.2 Not worth more than a bare mention /2 to 3.2 to Substantial to 2 to Strong >2 > Decisive Kaas and Raftery 95 Spread of effect in group comparisons Better model does not mean a behavioural effect is concentrated in one parameter Obvious raw differences spread between parameters

103 Are no panacea statistics about specific aspects of decision machinery only account for part of the variance Model needs to match experiment ensure subjects actually do the task the way you wrote it in the model model comparison Model = Quantitative hypothesis strong test need to compare models, not parameters includes all consequences of a hypothesis for choice

104 Modelling in psychiatry Hypothesis testing otherwise untestable hypotheses internal processes Limited by data quality Look for strong behaviours, not noisy Holistic testing of hypotheses Marr s levels physical algorithm computational

Reinforcement learning crash course

Reinforcement learning crash course crash course Quentin Huys Translational Neuromodeling Unit, University of Zurich and ETH Zurich University Hospital of Psychiatry Zurich Computational Psychiatry Course Zurich, 1.9.2016 Overview : rough