Modelling behavioural data

Size: px
Start display at page:

Download "Modelling behavioural data"

Transcription

1 Modelling behavioural data Quentin Huys MA PhD MBBS MBPsS Translational Neuromodeling Unit, ETH Zürich Psychiatrische Universitätsklinik Zürich

2 Outline An example task Why build models? What is a model Fitting models Validating & comparing models Model comparison issues in psychiatry

3 Example task Go Nogo Rewarded Avoids loss Guitart-Masip, Huys et al. Submitted

4 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip, Huys et al. Submitted

5 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip, Huys et al. Submitted

6 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Think of it as four separate two-armed bandit tasks Guitart-Masip, Huys et al. Submitted

7 Analysing behaviour Standard approach: Decide which feature of the data you care about Run descriptive statistical tests, e.g. ANOVA Many strengths Weakness Piecemeal, not holistic / global Descriptive, not generative No internal variables

8 Analysing behaviour Standard approach: Decide which feature of the data you care about Run descriptive statistical tests, e.g. ANOVA Probability correct Go to Go to Nogo to Nogo to Win Avoid Win Avoid Many strengths Weakness Piecemeal, not holistic / global Descriptive, not generative No internal variables

9 Models Holistic Aim to model the process by which the data came about in its entirety Generative They can be run on the task to generate data as if a subject had done the task Inference process Capture the inference process subjects have to make to perform the task. Do this in sufficient detail to replicate the data. Parameters replace test statistics their meaning is explicit in the model their contribution to the data is assessed in a holistic manner

10 A simple Rescorla-Wagner model Q values Q t (a t,s t )=Q t (a t,s t )+ (r t Q t (a t,s t )) a t s t action on trial t; can be either go or logo stimulus presented on trial t learning rate 5 Rewards 4 3 Key points: Value 2 Time!=.!=.5!=.!=. Q is the key part of the hypothesis formally states the learning process in quantitative detail formalizes internal quantities that are used in the task

11 Actions Q values Action probabilities: softmax of Q value Features: Q t (a t,s t )=Q t (a t,s t )+ (r t Q t (a t,s t )) p(a t s t,h t, ) = p(a t Q(a t,s t ), ) = p(a t s t ) / Q(a t,s t ) e Q(a t,s t ) P a e Q(a,s t ) apple p(a) apple links learning process and observations choices, RTs, or any other data link function in GLMs man other forms

12 Fitting models I Maximum likelihood (ML) parameters ˆ = argmax L( ) where the likelihood of all choices is: L( ) = log p({a t } T t= {s t } T t=, {r t } T t=, {z} ), = log p({a t } T t= {Q(s t,a t ; )} T t=, ) TY = log p(a t Q(s t,a t ; ), ) = TX t= t= log p(a t Q(s t,a t ; ), )

13 Fitting models II No closed form Use your favourite method gradients fminunc / fmincon... Gradients for RW model dl( ) d dq t (a t,s t ; ) d = d d = X t X log p(a t Q t (a t,s t ; ), ) t d d Q t(a t,s t ; ) = ( ) dq t (a t,s t ; ) d X a p(a Q t (a,s t ; ), ) d d Q t(a,s t ; ) +(r t Q t (a t,s t ; ))

14 Little tricks Transform your variables = e = log( ) = +e = log d log L( ) d Avoid over/underflow y(a) = Q(a) y m = max a p = e y(a) P y(a) = ey(a) ym P b ey(b) b ey(b) y m

15 ML characteristics 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate Run Daw 2 d 2 d i d j L( )

16 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate Run Daw 2 d 2 d i d j L( ) Hessian can be used to derive confidence intervals and identify poorly constrained estimates

17 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate Run Daw 2 d 2 d i d j L( ) Hessian can be used to derive confidence intervals and identify poorly constrained estimates ML can overfit... more later

18 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 Estimate L( = ) L( = ) Run Daw 2 d 2 d i d j L( ) Hessian can be used to derive confidence intervals and identify poorly constrained estimates ML can overfit... more later

19 ML characteristics ML is asymptotically consistent, but variance high -armed bandit, infer beta and epsilon 2 2 trials, stimulus, actions, learning rate =.5, beta=2 beta and epsilon can trade off Estimate L( = ) L( = ) α Hessian can be used to derive confidence intervals and identify poorly constrained estimates ML can overfit... more later Run d 2 d i d j L( ).5 2 β Daw 2 Figure : Likelihood surface for simulated reinforcement learning data, as a function of ters. Lighter colors denote higher data likelihood. The maximum likelihood estimate is surrounded by an ellipse of one standard error (a region of about 9% confidence); the from which the data were generated are denoted by an x. and Dayan, 22). In fact, the Q-learning rule of Equation 2 can be seen as a simplified ca filter: the Bayesian model uses the same learning rule but has additional machinery tha learning rate parameter a on a trial-by-trial basis (Kakade and Dayan, 22; Behrens et al. 28). Data likelihood: Given the model described above, the probability of a whole datase sequence of choices c = c...t given the rewards r = r...t ) is just product of their probabilit 3, P(c t = L Q t (L), Q t (R)) t Note that the terms Q t in the softmax are determined (via equation 2) by the rewards r c...t on trials prior to t.

20 Priors Not so smooth Smooth Prior Probability.2. Prior Probability Parameter 5 5 Parameter

21 Priors Not so smooth Smooth Prior Probability.2. Prior Probability Parameter 5 5 Parameter

22 Priors Not so smooth Smooth Prior Probability.2. Prior Probability Parameter. 5 5 Parameter.8 Memory Switches 6 Dombrovski et al. 2

23 Maximum a posteriori estimate P( )=p( a...t )= p(a...t )p( ) d p( a...t )p( ) log P( )= T t= log p(a t ) + log p( )+const. log P( ) d = log L( ) d + dp( ) d If likelihood is strong, prior will have little effect mainly has influence on poorly constrained parameters if a parameter is strongly constrained to be outside the typical range of the prior, then it will win over the prior

24 Maximum a posteriori estimate $ )*+,-.+/!!$!%! "!! #!!! #"!! $!!! &'( 2 trials, stimulus, actions, learning rate =.5, beta=2 mbeta=, meps=-3, n=

25 But What prior parameters should I use?

26 ML characteristics: group data A i

27 ML characteristics: group data A i A. A. A.

28 ML characteristics: group data A i A. A. A. Fixed effect conflates within- and between- subject variability

29 ML characteristics: group data A i A. A. A. Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model

30 ML characteristics: group data i A i A. A. A. A i A. A. A. Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy

31 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean A. A. A. A.

32 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean p(a i µ, ) = Z A. A. A. A. d i p(a i i ) p( i µ, )

33 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean p(a i µ, ) = Z A. A. A. A. d i p(a i i ) p( i µ, ) {z }

34 ML characteristics: group data i µ, A i A. A. A. A i A. A. A. i µ, µ, µ, i i i A. A. A. A. A. A. A. A. A. A. A. A. Fixed effect conflates within- and between- subject variability Average behaviour disregards between-subject variability need to adapt model Summary statistic treat parameters as random variable, one for each subject overestimates group variance as ML estimates noisy Random effects prior mean = group mean p(a i µ, ) = Z A. A. A. A. d i p(a i i ) p( i µ, ) {z }

35 Estimating the hyperparameters MAP log P( )=L( ) + log p( ) =p( ) Empirical Bayes: set them to ML estimate ˆ = argmax p(a ) +const. where we use all the actions by all the k subjects A = {a k...t } K k=

36 ML estimate of top-level parameters $ M )*+,-.+/!!$!%! "!! #!!! #"!! $!!! &'( Subject ˆ = argmax p(a ) K A T

37 Estimating the hyperparameters Effectively we now want to do gradient ascent on: d d p(a ) But this contains an integral over individual parameters: p(a ) = d p(a ) p( ) So we need to: ˆ = argmax p(a ) = argmax d p(a ) p( )

38 Expectation Maximisation Z log p(a ) = log d p(a, ) Z p(a, ) = log d q( ) q( ) Z p(a, ) d q( ) log q( ) Jensen s inequality k th Estep:q (k+) ( ) p( A, (k) ) Z k th Mstep: (k+) argmax d q( ) log p(a, ) There are other approaches Monte Carlo Analytical conjugate priors Variational Bayes Iterate between Estimating MAP parameters given prior parameters Estimating prior parameters from MAP parameters

39 EM with Laplace approximation E step: only need sufficient statistics to perform M step p( A, (k) ) N (m k, S k ) Approximate and hence: q (k+) ( ) p( A, (k) ) E step: q k ( ) = N (m k, S k ) m k argmax p(a k )p( (i) ) S k 2 p(a k )p( (i) ) 2 =m k matlab: [m,l,,,s]=fminunc( ) Just what we had before: MAP inference given some prior parameters

40 EM with Laplace approximation Next update the prior Prior mean = mean of MAP estimates M step: (i+) µ = K m k (i+) 2 = N k i (m k ) 2 + S k ( (i+) µ ) 2 Prior variance depends on inverse Hessian S and variance of MAP estimates Take uncertainty of estimates into account And now iterate until convergence

41 Hierarchical / random effects models Advantages Accurate group-level mean and variance Outliers due to weak likelihood are regularized Strong outliers are not Useful for model selection Disadvantages Individual estimates i depend on other data, i.e. on A j6=i and therefore need to be careful in interpreting these as summary statistics Error bars on group parameters (especially group variance) are difficult to obtain More involved; less transparent

42 Link functions Sigmoid - greedy p(a s) = p(a s) = e Q(a,s) P a e Q(a,s) c if a = argmaxa Q(a, s) c a else Q(a) P(a) P(a) P(a) 2 2 irreducible noise p(a s) = 2 g + g e Q(a,s) P a e Q(a,s) critical sanity check : reasonable link function? Empirical Probability(Go) other link functions for other observations Predicted Probability(Go)

43 Model comparison A fit by itself is not meaningful Generative test qualitative Comparisons vs random vs other model -> test specific hypotheses and isolate particular effects in a generative setting

44 Model fit: likelihood How well does the model do? choice probabilities: typically around for 2-way choice for -armed bandit example pseudo-r 2 : -L/R better than chance? Ep(correct) = e L(ˆ)/K/T Predictive probabilities Count log p(a )/K/T = e = K,T k,t= p(a k,t k ) KT E[N k (correct)] = E[p k (correct)]t p bin (E[N k (correct)] N k d, p =) < Predictive Probability

45 Generative test Model: probability(actions) simply draw from this distribution, and see what happens Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Critical sanity test: is the model meaningful? Caveat: overfitting

46 Overfitting Y X

47 Model comparison Data.5 Model Model 2 P(x) 5 5 x

48 Model comparison Averaged over its parameter settings, how well does the model fit the data? p(a M) = Model comparison: Bayes factors Z d p(a ) p( M) BF = p(m A) p(m A) = p(a M ) p(m ) p(a M 2 ) p(m 2 ) Problem: integral rarely solvable approximation: Laplace, sampling, variational...

49 Why integrals? The God Almighty test. Powerful model. Weak model

50 Why integrals? The God Almighty test. Powerful model. Weak model

51 Why integrals? The God Almighty test. Powerful model. Weak model N (p(x )+p(x 2 )+ ) These two factors fight it out Model complexity vs model fit

52 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X X

53 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X X

54 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X X

55 Bayesian Information Criterion Laplace s approximation (saddle-point method) Y log(y) X Just a Gaussian X dx f(x) f (x ) 2 2

56 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

57 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

58 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

59 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

60 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

61 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

62 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 ) ii / T ) 2 log( ) N 2 log(t ) Bayesian Information Criterion (BIC) N Akaike Information Criterion (AIC)

63 Bayesian Information Criterion: one subject p(a M) = Z d p(a, M) p( M) p(a ML, M)p( ML M) q (2 ) N p(a ) p( M) is propto Gaussian p( M) = const. Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 ) ii / T ) 2 log( ) N 2 log(t ) Bayesian Information Criterion (BIC) N Akaike Information Criterion (AIC) Model fit vs Model complexity

64 Group data Multiple subjects Multiple models do they use the same model? If not parameters are not comparable which model best accounts for all of them? Multiple groups difference in models? difference in parameters? 2 k possible model comparisons Multiple parameters 2 k possible correlations with any one psychometric measure

65 Group data - approaches Summary statistic Treat individual model comparison measure as summary statistics, do ANOVA or t-test Fixed effect analysis Subject data independent log p(a M) = X i Random effects analyses Hierarchical prior on group parameters p(a M) = = X i log p(a i M) Z log Hierarchical prior on models Z d Z d i p(a i i )p( i M i ) 2 d p(a ) p( ) p( M) p(a, M k,r ) =p(a M k ) p(m k r) p(r ) X BIC i i

66 Group-level likelihood Contains two integrals: subject parameters prior parameters M p(a M) = d p(a, M) d p( ) p( M) A T K

67 Evaluating p(a M) p(a M) = d p(a, M) d p( ) p( M) M A T K

68 Evaluating p(a M) Two integrals tricky p(a M) = d p(a, M) d p( ) p( M) M A T K

69 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Top level first: M A T K

70 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Top level first: p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) log p(a M) log p(a ML, M) M (2 ) N 2 log( )+N 2 log(2 ) A T K

71 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

72 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

73 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

74 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

75 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

76 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 )

77 Evaluating p(a M) Two integrals p(a M) = d p(a, M) d p( ) p( M) tricky Step by step: approximating levels separately Approximate at the top level less action p(a, M) p( M) is propto Gaussian p(a M) = d p(a, M) p( M) p(a ML, M)p( ML M) (2 ) N Model doesn t prefer particular log p(a M) log p(a ML, M)+ 2 log( )+N 2 log(2 ) just as before, top-level BIC

78 Approximating level Still leaves the first level: Approximate integral by sampling, e.g. importance sampling for few dimensions (<) log p(a ML, M) = log d p(a ) p( ML ) log B B b= p(a b ) b p( ML )

79 Group-level BIC log p(a M) = d p(a ) p( M) 2 BIC int = log ˆp(A ˆML ) 2 M log( A )

80 Example task Go Nogo Rewarded Avoids loss Guitart-Masip, Huys et al. Submitted

81 Example task Go Nogo Rewarded Probability correct Avoids loss Go to Go to Nogo to Nogo to Win Avoid Win Avoid Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip, Huys et al. Submitted

82 Model validation: generating data Go Nogo Rewarded Avoids loss Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

83 Model validation: generating data Go Nogo Rewarded Avoids loss p(go s t ) Q t (go s t )+bias(go) Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

84 Model validation: generating data Go Nogo Rewarded Avoids loss p(go s t ) Q t (go s t )+bias(go)+v t (s t ) V t (s t ) = V t (s t )+ (r t V t (s t )) Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

85 Model validation: generating data P(go) value of stimulus Go Nogo Rewarded Avoids loss p(go s t ) Q t (go s t )+bias(go)+v t (s t ) V t (s t ) = V t (s t )+ (r t V t (s t )) Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) Guitart-Masip et al. 2, Guitart-Masip, Huys et al. Submitted

86 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

87 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

88 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

89 Model comparison: overfitting? Go Nogo Rewarded RW RW + noise RW + noise + Q Avoids loss RW + noise + bias RW (rew/pun) + noise + bias RW + noise + bias + Pav BIC int Note: same number of parameters Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go)

90 How does it do? LL BIC int Sep:βα Q Sep:βαQ 2βα2 GoOnly Q βα2 Q GoOnly βα GoOnly Q βα4 Q βα Q βαq βαq βα Q βα4 Q βα GoOnly Q βα2 GoOnly Q 2βα2 GoOnly Sep:βαQ Sep:βα Q Σ i BIC i LL Int

91 How does it do? LL BIC int Sep:βα Q Sep:βαQ 2βα2 GoOnly Q βα2 Q GoOnly βα GoOnly Q βα4 Q βα Q βαq βαq βα Q βα4 Q βα GoOnly Q βα2 GoOnly Q 2βα2 GoOnly Sep:βαQ Sep:βα Q Σ i BIC i LL Int Fitted by EM... too nice?

92 Top-level Laplacian approximation Estimating the top-level determinant using 2nd order finite differences d 2 dh 2 ij p(a ) = ˆML 2 p(a ˆML + e i ) 2p(A ˆML )+p(a ˆML e j ) the shifted likelihoods can be evaluated by shifting the samples.

93 Group level errors Prior Mean Group level precision N=5 T=6 N= T=6 N= T=3 N=5 T=2 Beta N=2 T=2 N= T=2 N=5 T=2 2 4 Increasing Subject # N Increasing Trials T N=5 T=6 N= T=6 Learning rate N= T=3 N=5 N=2 T=2 T=2 N= T=2 N=5 T=2 2 4 Prior Mean True and estimates ± 95% CI Prior Variance Group level precision N=5 T=6 N= T=6 N= T=3 N=2 N=5 T=2 T=2 N= T=2 N=5 T=2 2 4 Total # of observations (N x T) N=5 T=6 N= T=6 N= T=3 N= T=2 N=5 T=2 N=5 T=2 N=2 T=2 2 4 Total # of observations (N x T) Prior Variance True and estimates ± 95% CI Total # of observations (N x T)

94 Posterior distribution on models Generative model for models Stephan et al. 29

95 Bayesian model selection - equations Write down joint distribution of generative model Variational approximations lead to set of very simple update equations start with flat prior over model probabilities α = α : then update u i k = Z d i p(a i, i M k ) exp ( k ) X k!! k k,k + X i P ui k k ui k

96 Group Model selection Integrate out your parameters

97 Questions in psychiatry I: regression Parametric relationship with other variables do standard second level analyses can use Hessians to determine weights E step: q k ( ) = N (m k, S k ) m k argmax p(a k )p( (i) ) S k 2 p(a k )p( (i) ) 2 =m k better: compare two models Q Model : i p(a i i ) p( i µ, ) i.e. i N (µ, ) Q Model 2: i p(a i i ) p( i µ,c,, i ) i.e. i N (µ + c i, )

98 Regression Standard regression analysis: m i = Cr i + /2 i Including uncertainty about each subject s inferred parameters m i = Cr i +( /2 + S /2 i ) i Careful: Finite difference estimates S can be noisy! regularize...

99 Questions in psychiatry II: group differences Do groups differ in terms of parameter(s)? Cannot compare parameters across different models even very similar parameters can account for different effects Go rewarded Go to win Nogo punished Go to avoid Nogo rewarded Nogo to win Go punished Nogo to avoid Probability(Go) For models with k parameters, there are 2 k possible comparisons multiple comparisons? posterior over models (Stephan et al. 29)

100 Group differences in parameters Are two groups similar in parameter x? ANOVA: compare likelihood of two means to likelihood of one global mean. Take degrees of freedom into account. But: this tries to account for the parameters with one or two groups, not for the data Compare models with separate or joint parameter & prior: Model ε β, β2 Model 2 ε β

101 Questions in psychiatry III: Classification Who belongs to which of two groups? How many groups are there?

102 Model comparison again What is significant? BF = p(a M ) p(a M 2 ) p( < ) IOglO(BO) B Evidence against Ho oto /2 to 3.2 Not worth more than a bare mention /2 to 3.2 to Substantial to 2 to Strong >2 > Decisive Kaas and Raftery 95 Spread of effect in group comparisons Better model does not mean a behavioural effect is concentrated in one parameter Obvious raw differences spread between parameters

103 Are no panacea statistics about specific aspects of decision machinery only account for part of the variance Model needs to match experiment ensure subjects actually do the task the way you wrote it in the model model comparison Model = Quantitative hypothesis strong test need to compare models, not parameters includes all consequences of a hypothesis for choice

104 Modelling in psychiatry Hypothesis testing otherwise untestable hypotheses internal processes Limited by data quality Look for strong behaviours, not noisy Holistic testing of hypotheses Marr s levels physical algorithm computational

Reinforcement learning crash course

Reinforcement learning crash course crash course Quentin Huys Translational Neuromodeling Unit, University of Zurich and ETH Zurich University Hospital of Psychiatry Zurich Computational Psychiatry Course Zurich, 1.9.2016 Overview : rough

More information

SRNDNA Model Fitting in RL Workshop

SRNDNA Model Fitting in RL Workshop SRNDNA Model Fitting in RL Workshop yael@princeton.edu Topics: 1. trial-by-trial model fitting (morning; Yael) 2. model comparison (morning; Yael) 3. advanced topics (hierarchical fitting etc.) (afternoon;

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Linear Regression In God we trust, all others bring data. William Edwards Deming

Linear Regression In God we trust, all others bring data. William Edwards Deming Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Will Penny. SPM for MEG/EEG, 15th May 2012

Will Penny. SPM for MEG/EEG, 15th May 2012 SPM for MEG/EEG, 15th May 2012 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Will Penny. DCM short course, Paris 2012

Will Penny. DCM short course, Paris 2012 DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Key concepts: Reward: external «cons umable» s5mulus Value: internal representa5on of u5lity of a state, ac5on, s5mulus W the learned weight associate

Key concepts: Reward: external «cons umable» s5mulus Value: internal representa5on of u5lity of a state, ac5on, s5mulus W the learned weight associate Key concepts: Reward: external «cons umable» s5mulus Value: internal representa5on of u5lity of a state, ac5on, s5mulus W the learned weight associated to the value Teaching signal : difference between

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information