Introduc)on to Bayesian Methods

Size: px
Start display at page:

Download "Introduc)on to Bayesian Methods"

Transcription

1 Introduc)on to Bayesian Methods

2 Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy

3 Bayes Rule py x) = px y)py) px) pparameters data) = p! D) = p! D) = pd!)p!) pd)! pd!)p!) pd!)p!)d! pdata parameter)pparameters) pdata)

4 likelihood prior p! D) = posterior pd!)p!) pd) evidence

5 probability of the data given that parameter value or vector of parameter values) prior probability of that parameter value or vector of parameter values) p! D) = probability of a particular parameter value or vector of parameter values) given the data pd!)p!) pd) probability of the data averaged across all possible parameter values or vector)

6 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) just making the model M 1) explicit in Bayes rule

7 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) just to make clear that the set of parameters might be different across different models these are parameters associated with Model 1

8 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pa B) = pb A)pA) pb) this is just Bayes rule

9 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) px y) = py x)px) py) this is just Bayes rule again

10 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) this is just Bayes rule for Model M 1 given Data just applying formula) I haven t included θ 1) here because I don t care about θ 1) yet) this is just about M 1) and D

11 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) pd M 1) ) =! pd! 1), M 1) )p! 1) M 1) )d! 1)

12 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) prior probability of Model M 1)

13 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M )pm ) 1) 1) pd) this is the probability of the Data given any possible Model under consideration it will be the same for every possible Model tested pd) =! pd M j )pm j ) j why did I make this a sum and not an integral?

14 p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) pm 1) D) pm 2) D) = pd M 1) )pm 1) ) pd) pd M 2) )pm 2) ) pd) pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) )

15 Bayesian model selection pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) ) Bayes factor model priors pd M ) =! pd!, M )p! M )d! j) j) j) j) j) j) likelihood prior

16 Bayesian model selection pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) ) Bayes factor model priors In Bayesian model selection, models can be nested or nonnested Of advantage of Bayesian model selection is that it is not as sensitive to data sample size as classical significance procedures e.g., like the G 2 test)

17 Bayesian model selection pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) ) Bayes factor model priors You often just see Bayes factor in terms of PD M) or even PD), where the model M is implicit Since this is intended to be used for model selections, it s odd that few authors note that this is really pm D) after applications of Bayes rule, which is what model selection is all about, but with no prior pm) on the Models

18

19 likelihood prior p! D) = posterior pd!)p!) pd) evidence Where does the prior probability pθ) come from? noninformative priors - akin to a uniform distribution - or - based on theory - or - based on past data

20

21 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the maximum likelihood estimate of p?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x

22 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x P! D) = PD!)P!) PD)

23 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) in this example, please don t confuse the big P with the little p one is the probability in Bayes rule P) the other is the parameter p in the binomial distribution

24 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x this is the likelihood $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px)

25 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) what functional form of the prior should be used?

26 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) what functional form of the prior should be used? what are some possibilities?

27

28 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) Pp) ~ Betaa, b) = pa!1 1! p) b!1 Bea, b) = pa!1 1! p) b!1 "a)"b) "a + b) "x) = x!1)!

29 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? Pp x) =! # " N x $ & p x 1' p) N'x p a'1 1' p) b'1 Bea, b) Px)

30 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? Pp x) =! # " N x $ & p x 1' p) N'x p a'1 1' p) b'1 Bea, b) Px) collect the terms that involve p

31 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? Pp x) = p x 1! p) N!x p a!1 1! p) b!1 " $ # N x ' & Px)Bea, b) Px) =! Px p)p p)dp constant wrt p

32 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? to be a true probability distribution, this needs to sum to 1 Pp x)! p x 1" p) N"x p a"1 1" p) b"1 you often see things like this, where is a proportionality without the denominator

33 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? to be a true probability distribution, this needs to sum to 1 Pp x)! p x 1" p) N"x p a"1 1" p) b"1 you often see things like this, where is a proportionality without the denominator with techniques like MCMC, it s often sufficient

34 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? this can be shown to be true, but for most problems, an analytic solution isn t possible Pp x) = px 1! p) N!x p a!1 1! p) b!1 Bex + a, N! x + b)

35 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? rearranging some terms Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b)

36 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? rearranging some terms Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b) what does this look like?

37 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? posterior is also a Beta distribution Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b) recall: Pp) ~ Betaa, b) = pa!1 1! p) b!1 Bea, b)

38 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? why might it be nice to have a posterior that has a distribution the same as the prior? Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b) recall: Pp) ~ Betaa, b) = pa!1 1! p) b!1 Bea, b)

39 likelihood prior p! D) = posterior pd!)p!) pd) evidence so-called conjugate prior have a functional form and the only functional form) that give you posteriors with the same functional form and hence let you turn around and plug the posterior back in as the new prior

40 likelihood prior p! D) = posterior pd!)p!) pd) evidence use of conjugate prior is good Bayesian style it was also necessary before computer simulation techniques like MCMC allowed arbitrary functional forms with no need for analytic solutions

41 likelihood prior p! D) = posterior pd!)p!) pd) evidence Beta distribution = conjugate prior for Binomial/Bernoulli Dirichlet distribution = conjugate prior for Multinomial Normal distribution = conjugate prior for mean of Normal Inverse Gamma = conjugate prior for variance of Normal

42 see week13.m

43 Bayesian parameter estimation 1) Maximum a posteriori MAP) maximum of the posterior distribution pθ D) see week12.m 2) Expected value of parameters E[!] =!! p! D)d!

44 Bayesian parameter estimation 3) Highest Density Interval HDI) aka Highest Density Region HDR)

45

46 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? fair model assume p=.5 unfair model assumes 0 p 1

47 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? fair model assume p=.5 unfair model assumes 0 p 1 Px p) =! # " N x $ & p x 1' p) N'x PD M j) ) =! PD! j), M j) )P! j) M j) )d! j)

48 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? fair model assume p=.5 unfair model assumes 0 p 1 Px p) =! # " N x $ & p x 1' p) N'x Px M j) ) =! Px p j), M j) )Pp j) M j) )dp j) assume uniform prior on Pp)

49 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) For the fair model: Px p) =! # " N x $ &.5 x 1'.5) N'x =! # " N x $ &.5 N Px M fair ) =! # " N x $! &.5 N '1= # " Why is there no integral? And what is the 1? N x $ &.5 N Px M j) ) =! Px p j), M j) )Pp j) M j) )dp j)

50 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) For the unfair model: Px p) =! # " N x $ & p x 1' p) N'x Px M unfair ) =! N $ ) 1 # & p x 1' p) N'x 1dp = 1 0 " x N +1 Need to trust me on this.

51 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm D) 1) pm D) = pd M 1) ) pd M ) 2) 2) pm 1) ) pm 2) )

52 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm x) fair pm x) = px M ) fair px M ) unfair unfair pm fair ) pm unfair )

53 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm fair x) pm unfair x) =! # " N $ &.5 N x N +1

54 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm fair x) pm unfair x) =! # " 20 $ & = Imagine N=20 tosses and x=12 heads 2.52 times more likely that the coin is fair than unfair

55 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm fair x) pm unfair x) =! # " 20 $ & = automatically penalizes the unfair model for number of parameters and complexity since it has to fit the data better than the fair model

56

57 Another example: Bayesian estimation of Normally distributed data

58 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ,! 2 X) = PX µ,! 2 )Pµ,! 2 ) PX) Let s start by looking at just a single parameter at a time that s straightforward joint distribution is trickier)

59 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of µ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) This is a conditional posterior assumes we know σ 2 i.e., it s a constant, not a parameter) What might be a reasonable prior on μ?

60 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

61 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

62 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

63 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

64 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & ' µ 0 and σ 2 0 are called hyperparameters

65 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) ~ Nµ ',! 2 ') " µ ' = 1 x! 2 j + 1 2! µ $! 0 '! 2 ' # j 0 & "! 2 ' = $ N 1! + 1 # 2! 0 2 ' & 1

66 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) This is a conditional posterior What might be a reasonable prior on σ 2?

67 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

68 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

69 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

70 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

71 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) ~ InverseGammaa', b') a' = N / 2 + a # & b' = "x j! µ) 2 $ j ' / 2 + b

72 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? Pµ X,! 2 ) ~ Nµ ',! 2 ') " µ ' = 1 x! 2 j + 1 2! µ $! 0 '! 2 ' # j 0 & "! 2 ' = $ N 1! + 1 # 2! 0 2 ' & 1 P! 2 X,µ) ~ InverseGammaa', b') a' = N / 2 + a # & b' = "x j! µ) 2 $ j ' / 2 + b In most circumstances, you want the unconditionalized posterior distributions Pμ X) and Pσ 2 X) unconditionalized posterior distributions are called marginal distributions

73 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? Pµ X,! 2 ) ~ Nµ ',! 2 ') " µ ' = 1 x! 2 j + 1 2! µ $! 0 '! 2 ' # j 0 & "! 2 ' = $ N 1! + 1 # 2! 0 2 ' & 1 P! 2 X,µ) ~ InverseGammaa', b') a' = N / 2 + a # & b' = "x j! µ) 2 $ j ' / 2 + b pµ X) = p! 2 X) =! pµ X,! 2 )p! 2 X)d! 2! p! 2 X,µ)p µ X)dµ integrals like this are usually hard to solve! Monte Carlo Markov Chain MCMC) Methods

74 a challenge of doing Bayesian Statistical Analysis is that the solutions require solving complex integrals p! D) = p! D) =! pd )!)p!) pd )!)p!)! p!, " D)d" p! D) =! p! ", D)p")d" E[ f!)] =! f!) p! D)d!

75 a challenge of doing Bayesian Statistical Analysis is that the solutions require solving complex integrals only in limited cases can these be solve analytically e.g., univariate models that are binomial, normal, etc.) and outside of models with only a few parameters, these integrals cannot be solved using standard numeric integration techniques Monte Carlo methods including MCMC)

76 consider this integral:!! p! D)d! what is this?

77 consider this integral:!! p! D)d! posterior distribution of parameter θ given data D

78 consider this integral:!! p! D)d! what is this?

79 consider this integral:!! p! D)d! E[! D] =!! p! D)d!

80 consider this integral:!! p! D)d! E[! D] =!! p! D)d! recall that this is a function

81 consider this integral:!! p! D)d! E[! D] =!! p! D)d! how can we evaluate this?

82 consider this integral:!! p! D)d! E[! D] =!! p! D)d! how can we evaluate this? - analytically hard and often impossible - numerical integration techniques inefficient - Monte Carlo methods often preferred or only method

83 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) imagine we have an engine that spits out θ s with probability pθ D)

84 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1)

85 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2)

86 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3)

87 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4)

88 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5)

89 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) What is the E[θ D]?

90 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) E[! D] =!! p! D)d! " #! j) 1 N j Monte Carlo simulation of an integral

91 Simple Monte Carlo Integration E[g!) D] =! g!) p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) E[g!) D] =! g!) p! D)d! " # g! j) ) 1 N j Monte Carlo simulation of an integral

92 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Of course, this assumes you can create an engine that spits out independent samples from a distribution We ve talked about some such engines, line rand) or randn) or other matlab random number routines

93 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) But independent sampling from a posterior density pθ D) is usually not feasible or simply impossible) WHY? Keep in mind that in the most general case pθ D) can be arbitrarily complex and have many many parameters

94 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) But independent sampling from a posterior density pθ D) is usually not feasible or simply impossible) but we can do dependent or autocorrelated sampling Monte Carlo Markov Chains

95 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Independent Sampling Pθ t) θ 1) θ t-1) ) = Pθ t) ) because the samples are independent, smaller sample sizes are needed to approximate distributions or integrals Sampling from a Markov Chain Process Pθ t) θ 1) θ t-1) ) = Pθ t) θ t-1) ) because the samples are dependent, far larger sample sizes are needed to approximate distributions or integrals

96 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) in reality, many random number generators are actually Markov Processes Independent Sampling Pθ t) θ 1) θ t-1) ) = Pθ t) ) because the samples are independent, smaller sample sizes are needed to approximate distributions or integrals Sampling from a Markov Chain Process Pθ t) θ 1) θ t-1) ) = Pθ t) θ t-1) ) because the samples are dependent, far larger sample sizes are needed to approximate distributions or integrals

97 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Independent Sampling Pθ t) θ 1) θ t-1) ) = Pθ t) ) First, let s look at independent sampling see week13.m

98 Independent Sampling

99 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Sampling from a Markov Chain Process Pθ t) θ 1) θ t-1) ) = Pθ t) θ t-1) ) What is a Monte Carlo Markov Chain first, let s see it in action see week13.m

100 Monte Carlo Markov Chain Sampling

101 Independent Remember: this is what we are trying to derive MCMC this is what we re deriving it from

102 Independent Remember: this is what we are trying to derive MCMC where does this come from?

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Inferring information about models from samples

Inferring information about models from samples Contents Inferring information about models from samples. Drawing Samples from a Probability Distribution............. Simple Samples from Matlab.................. 3.. Rejection Sampling........................

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1 The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Advanced Probabilistic Modeling in R Day 1

Advanced Probabilistic Modeling in R Day 1 Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

CS 340 Fall 2007: Homework 3

CS 340 Fall 2007: Homework 3 CS 34 Fall 27: Homework 3 1 Marginal likelihood for the Beta-Bernoulli model We showed that the marginal likelihood is the ratio of the normalizing constants: p(d) = B(α 1 + N 1, α + N ) B(α 1, α ) = Γ(α

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016 Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Nested Sampling. Brendon J. Brewer.   brewer/ Department of Statistics The University of Auckland Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

HPD Intervals / Regions

HPD Intervals / Regions HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Bayesian Estimation An Informal Introduction

Bayesian Estimation An Informal Introduction Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Compute f(x θ)f(θ) dθ

Compute f(x θ)f(θ) dθ Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/

More information

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

LEARNING WITH BAYESIAN NETWORKS

LEARNING WITH BAYESIAN NETWORKS LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

AALBORG UNIVERSITY. Learning conditional Gaussian networks. Susanne G. Bøttcher. R June Department of Mathematical Sciences

AALBORG UNIVERSITY. Learning conditional Gaussian networks. Susanne G. Bøttcher. R June Department of Mathematical Sciences AALBORG UNIVERSITY Learning conditional Gaussian networks by Susanne G. Bøttcher R-2005-22 June 2005 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7 G DK - 9220 Aalborg Øst

More information

Lecture 18: Learning probabilistic models

Lecture 18: Learning probabilistic models Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.

More information

Bayesian analysis in nuclear physics

Bayesian analysis in nuclear physics Bayesian analysis in nuclear physics Ken Hanson T-16, Nuclear Physics; Theoretical Division Los Alamos National Laboratory Tutorials presented at LANSCE Los Alamos Neutron Scattering Center July 25 August

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information