Introduc)on to Bayesian Methods

Size: px

Start display at page:

Download "Introduc)on to Bayesian Methods"

Evan Black
5 years ago
Views:

1 Introduc)on to Bayesian Methods

2 Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy

3 Bayes Rule py x) = px y)py) px) pparameters data) = p! D) = p! D) = pd!)p!) pd)! pd!)p!) pd!)p!)d! pdata parameter)pparameters) pdata)

4 likelihood prior p! D) = posterior pd!)p!) pd) evidence

5 probability of the data given that parameter value or vector of parameter values) prior probability of that parameter value or vector of parameter values) p! D) = probability of a particular parameter value or vector of parameter values) given the data pd!)p!) pd) probability of the data averaged across all possible parameter values or vector)

6 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) just making the model M 1) explicit in Bayes rule

7 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) just to make clear that the set of parameters might be different across different models these are parameters associated with Model 1

8 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pa B) = pb A)pA) pb) this is just Bayes rule

9 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) px y) = py x)px) py) this is just Bayes rule again

10 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) this is just Bayes rule for Model M 1 given Data just applying formula) I haven t included θ 1) here because I don t care about θ 1) yet) this is just about M 1) and D

11 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) pd M 1) ) =! pd! 1), M 1) )p! 1) M 1) )d! 1)

12 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) prior probability of Model M 1)

13 p! D) = pd!)p!) pd) p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M )pm ) 1) 1) pd) this is the probability of the Data given any possible Model under consideration it will be the same for every possible Model tested pd) =! pd M j )pm j ) j why did I make this a sum and not an integral?

14 p! 1) D, M 1) ) = pd! 1), M 1) )p! 1) M 1) ) pd M 1) ) pm 1) D) = pd M 1) )pm 1) ) pd) pm 1) D) pm 2) D) = pd M 1) )pm 1) ) pd) pd M 2) )pm 2) ) pd) pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) )

15 Bayesian model selection pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) ) Bayes factor model priors pd M ) =! pd!, M )p! M )d! j) j) j) j) j) j) likelihood prior

16 Bayesian model selection pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) ) Bayes factor model priors In Bayesian model selection, models can be nested or nonnested Of advantage of Bayesian model selection is that it is not as sensitive to data sample size as classical significance procedures e.g., like the G 2 test)

17 Bayesian model selection pm 1) D) pm 2) D) = pd M 1)) pd M 2) ) pm 1) ) pm 2) ) Bayes factor model priors You often just see Bayes factor in terms of PD M) or even PD), where the model M is implicit Since this is intended to be used for model selections, it s odd that few authors note that this is really pm D) after applications of Bayes rule, which is what model selection is all about, but with no prior pm) on the Models

19 likelihood prior p! D) = posterior pd!)p!) pd) evidence Where does the prior probability pθ) come from? noninformative priors - akin to a uniform distribution - or - based on theory - or - based on past data

21 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the maximum likelihood estimate of p?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x

22 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x P! D) = PD!)P!) PD)

23 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) in this example, please don t confuse the big P with the little p one is the probability in Bayes rule P) the other is the parameter p in the binomial distribution

24 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x this is the likelihood $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px)

25 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) what functional form of the prior should be used?

26 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) what functional form of the prior should be used? what are some possibilities?

28 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)?! Lx p) = Probx p) = # " N x $ & p x 1' p) N'x = N! x!n ' x)! px 1' p) N'x Pp x) = Px p)pp) Px) Pp) ~ Betaa, b) = pa!1 1! p) b!1 Bea, b) = pa!1 1! p) b!1 "a)"b) "a + b) "x) = x!1)!

29 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? Pp x) =! # " N x $ & p x 1' p) N'x p a'1 1' p) b'1 Bea, b) Px)

30 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? Pp x) =! # " N x $ & p x 1' p) N'x p a'1 1' p) b'1 Bea, b) Px) collect the terms that involve p

31 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? Pp x) = p x 1! p) N!x p a!1 1! p) b!1 " $ # N x ' & Px)Bea, b) Px) =! Px p)p p)dp constant wrt p

32 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? to be a true probability distribution, this needs to sum to 1 Pp x)! p x 1" p) N"x p a"1 1" p) b"1 you often see things like this, where is a proportionality without the denominator

33 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? to be a true probability distribution, this needs to sum to 1 Pp x)! p x 1" p) N"x p a"1 1" p) b"1 you often see things like this, where is a proportionality without the denominator with techniques like MCMC, it s often sufficient

34 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? this can be shown to be true, but for most problems, an analytic solution isn t possible Pp x) = px 1! p) N!x p a!1 1! p) b!1 Bex + a, N! x + b)

35 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? rearranging some terms Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b)

36 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? rearranging some terms Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b) what does this look like?

37 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? posterior is also a Beta distribution Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b) recall: Pp) ~ Betaa, b) = pa!1 1! p) b!1 Bea, b)

38 imagine we flip a coin 10 times and get 4 heads N=10, x=4) what is the Bayesian estimate of Pp x)? why might it be nice to have a posterior that has a distribution the same as the prior? Pp x) = px+a!1 1! p) N!x+b!1 Bex + a, N! x + b) recall: Pp) ~ Betaa, b) = pa!1 1! p) b!1 Bea, b)

39 likelihood prior p! D) = posterior pd!)p!) pd) evidence so-called conjugate prior have a functional form and the only functional form) that give you posteriors with the same functional form and hence let you turn around and plug the posterior back in as the new prior

40 likelihood prior p! D) = posterior pd!)p!) pd) evidence use of conjugate prior is good Bayesian style it was also necessary before computer simulation techniques like MCMC allowed arbitrary functional forms with no need for analytic solutions

41 likelihood prior p! D) = posterior pd!)p!) pd) evidence Beta distribution = conjugate prior for Binomial/Bernoulli Dirichlet distribution = conjugate prior for Multinomial Normal distribution = conjugate prior for mean of Normal Inverse Gamma = conjugate prior for variance of Normal

42 see week13.m

43 Bayesian parameter estimation 1) Maximum a posteriori MAP) maximum of the posterior distribution pθ D) see week12.m 2) Expected value of parameters E[!] =!! p! D)d!

44 Bayesian parameter estimation 3) Highest Density Interval HDI) aka Highest Density Region HDR)

46 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? fair model assume p=.5 unfair model assumes 0 p 1

47 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? fair model assume p=.5 unfair model assumes 0 p 1 Px p) =! # " N x $ & p x 1' p) N'x PD M j) ) =! PD! j), M j) )P! j) M j) )d! j)

48 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? fair model assume p=.5 unfair model assumes 0 p 1 Px p) =! # " N x $ & p x 1' p) N'x Px M j) ) =! Px p j), M j) )Pp j) M j) )dp j) assume uniform prior on Pp)

49 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) For the fair model: Px p) =! # " N x $ &.5 x 1'.5) N'x =! # " N x $ &.5 N Px M fair ) =! # " N x $! &.5 N '1= # " Why is there no integral? And what is the 1? N x $ &.5 N Px M j) ) =! Px p j), M j) )Pp j) M j) )dp j)

50 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) For the unfair model: Px p) =! # " N x $ & p x 1' p) N'x Px M unfair ) =! N $ ) 1 # & p x 1' p) N'x 1dp = 1 0 " x N +1 Need to trust me on this.

51 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm D) 1) pm D) = pd M 1) ) pd M ) 2) 2) pm 1) ) pm 2) )

52 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm x) fair pm x) = px M ) fair px M ) unfair unfair pm fair ) pm unfair )

53 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm fair x) pm unfair x) =! # " N $ &.5 N x N +1

54 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm fair x) pm unfair x) =! # " 20 $ & = Imagine N=20 tosses and x=12 heads 2.52 times more likely that the coin is fair than unfair

55 Simple example of Bayesian Model Evaluation from Shiffrin, Lee, Kim, & Wagenmakers 2008) Is a coin fair? M 1 fair model assume p=.5 M 2 unfair model assumes 0 p 1 pm fair x) pm unfair x) =! # " 20 $ & = automatically penalizes the unfair model for number of parameters and complexity since it has to fit the data better than the fair model

57 Another example: Bayesian estimation of Normally distributed data

58 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ,! 2 X) = PX µ,! 2 )Pµ,! 2 ) PX) Let s start by looking at just a single parameter at a time that s straightforward joint distribution is trickier)

59 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of µ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) This is a conditional posterior assumes we know σ 2 i.e., it s a constant, not a parameter) What might be a reasonable prior on μ?

60 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

61 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

62 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

63 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & '

64 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) = PX µ,! 2 )Pµ! 2 ) PX! 2 ) Pµ X,! 2 ) = " #! x j! µ) 2 & 1 1 PX! 2 ) 2"! 2 ) exp j 1 N 2! 2 2"! ) exp #!µ! µ 0 ) $ 2! 0 $ ' 2 & ' µ 0 and σ 2 0 are called hyperparameters

65 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' Pµ X,! 2 ) ~ Nµ ',! 2 ') " µ ' = 1 x! 2 j + 1 2! µ $! 0 '! 2 ' # j 0 & "! 2 ' = $ N 1! + 1 # 2! 0 2 ' & 1

66 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) This is a conditional posterior What might be a reasonable prior on σ 2?

67 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

68 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

69 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

70 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) = PX µ,! 2 )P! 2 µ) PX µ) P! 2 X,µ) = 1 PX µ) " #! x j! µ) 2 & 1 2"! 2 ) exp j b a #!b & exp N 2! 2 )a)! 2 a+1 ) $! 2 ' $ ' Inverse-Gamma Distribution

71 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? PX µ,! 2 ) = #!"x j! µ) 2 & 1 2"! 2 ) exp j N 2! 2 $ ' P! 2 X,µ) ~ InverseGammaa', b') a' = N / 2 + a # & b' = "x j! µ) 2 $ j ' / 2 + b

72 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? Pµ X,! 2 ) ~ Nµ ',! 2 ') " µ ' = 1 x! 2 j + 1 2! µ $! 0 '! 2 ' # j 0 & "! 2 ' = $ N 1! + 1 # 2! 0 2 ' & 1 P! 2 X,µ) ~ InverseGammaa', b') a' = N / 2 + a # & b' = "x j! µ) 2 $ j ' / 2 + b In most circumstances, you want the unconditionalized posterior distributions Pμ X) and Pσ 2 X) unconditionalized posterior distributions are called marginal distributions

73 imagine normally distributed data X=x 1, x 2, x 3,,x N ) what is the Bayesian estimate of μ and σ 2? Pµ X,! 2 ) ~ Nµ ',! 2 ') " µ ' = 1 x! 2 j + 1 2! µ $! 0 '! 2 ' # j 0 & "! 2 ' = $ N 1! + 1 # 2! 0 2 ' & 1 P! 2 X,µ) ~ InverseGammaa', b') a' = N / 2 + a # & b' = "x j! µ) 2 $ j ' / 2 + b pµ X) = p! 2 X) =! pµ X,! 2 )p! 2 X)d! 2! p! 2 X,µ)p µ X)dµ integrals like this are usually hard to solve! Monte Carlo Markov Chain MCMC) Methods

74 a challenge of doing Bayesian Statistical Analysis is that the solutions require solving complex integrals p! D) = p! D) =! pd )!)p!) pd )!)p!)! p!, " D)d" p! D) =! p! ", D)p")d" E[ f!)] =! f!) p! D)d!

75 a challenge of doing Bayesian Statistical Analysis is that the solutions require solving complex integrals only in limited cases can these be solve analytically e.g., univariate models that are binomial, normal, etc.) and outside of models with only a few parameters, these integrals cannot be solved using standard numeric integration techniques Monte Carlo methods including MCMC)

76 consider this integral:!! p! D)d! what is this?

77 consider this integral:!! p! D)d! posterior distribution of parameter θ given data D

78 consider this integral:!! p! D)d! what is this?

79 consider this integral:!! p! D)d! E[! D] =!! p! D)d!

80 consider this integral:!! p! D)d! E[! D] =!! p! D)d! recall that this is a function

81 consider this integral:!! p! D)d! E[! D] =!! p! D)d! how can we evaluate this?

82 consider this integral:!! p! D)d! E[! D] =!! p! D)d! how can we evaluate this? - analytically hard and often impossible - numerical integration techniques inefficient - Monte Carlo methods often preferred or only method

83 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) imagine we have an engine that spits out θ s with probability pθ D)

84 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1)

85 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2)

86 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3)

87 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4)

88 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5)

89 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) What is the E[θ D]?

90 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) E[! D] =!! p! D)d! " #! j) 1 N j Monte Carlo simulation of an integral

91 Simple Monte Carlo Integration E[g!) D] =! g!) p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) E[g!) D] =! g!) p! D)d! " # g! j) ) 1 N j Monte Carlo simulation of an integral

92 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Of course, this assumes you can create an engine that spits out independent samples from a distribution We ve talked about some such engines, line rand) or randn) or other matlab random number routines

93 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) But independent sampling from a posterior density pθ D) is usually not feasible or simply impossible) WHY? Keep in mind that in the most general case pθ D) can be arbitrarily complex and have many many parameters

94 Simple Monte Carlo Integration E[! D] =!! p! D)d! p! D) θ 1) θ 2) θ 3) θ 4) θ 5) But independent sampling from a posterior density pθ D) is usually not feasible or simply impossible) but we can do dependent or autocorrelated sampling Monte Carlo Markov Chains

95 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Independent Sampling Pθ t) θ 1) θ t-1) ) = Pθ t) ) because the samples are independent, smaller sample sizes are needed to approximate distributions or integrals Sampling from a Markov Chain Process Pθ t) θ 1) θ t-1) ) = Pθ t) θ t-1) ) because the samples are dependent, far larger sample sizes are needed to approximate distributions or integrals

96 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) in reality, many random number generators are actually Markov Processes Independent Sampling Pθ t) θ 1) θ t-1) ) = Pθ t) ) because the samples are independent, smaller sample sizes are needed to approximate distributions or integrals Sampling from a Markov Chain Process Pθ t) θ 1) θ t-1) ) = Pθ t) θ t-1) ) because the samples are dependent, far larger sample sizes are needed to approximate distributions or integrals

97 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Independent Sampling Pθ t) θ 1) θ t-1) ) = Pθ t) ) First, let s look at independent sampling see week13.m

98 Independent Sampling

99 p! D) θ 1) θ 2) θ 3) θ 4) θ 5) Sampling from a Markov Chain Process Pθ t) θ 1) θ t-1) ) = Pθ t) θ t-1) ) What is a Monte Carlo Markov Chain first, let s see it in action see week13.m

100 Monte Carlo Markov Chain Sampling

101 Independent Remember: this is what we are trying to derive MCMC this is what we re deriving it from

102 Independent Remember: this is what we are trying to derive MCMC where does this come from?

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.