INTRODUCTION TO BAYESIAN ANALYSIS

Size: px
Start display at page:

Download "INTRODUCTION TO BAYESIAN ANALYSIS"

Transcription

1 INTRODUCTION TO BAYESIAN ANALYSIS Arto Luoma University of Tampere, Finland Autumn 2014 Introduction to Bayesian analysis, autumn 2013 University of Tampere 1 / 130

2 Who was Thomas Bayes? Thomas Bayes ( ) was an English philosopher and Presbyterian minister. In his later years he took a deep interest in probability. He suggested a solution to a problem of inverse probability. What do we know about the probability of success if the number of successes is recorded in a binomial experiment? Richard Price discovered Bayes essay and published it posthumously. He believed that Bayes Theorem helped prove the existence of God. Introduction to Bayesian analysis, autumn 2013 University of Tampere 2 / 130

3 Bayesian paradigm Bayesian paradigm: posterior information = prior information + data information Introduction to Bayesian analysis, autumn 2013 University of Tampere 3 / 130

4 Bayesian paradigm Bayesian paradigm: posterior information = prior information + data information More formally: p(θ y) p(θ)p(y θ), where is a symbol for proportionality, θ is an unknown parameter, y is data, and p(θ), p(θ y) and p(y θ) are the density functions of the prior, posterior and sampling s, respectively. Introduction to Bayesian analysis, autumn 2013 University of Tampere 3 / 130

5 Bayesian paradigm Bayesian paradigm: posterior information = prior information + data information More formally: p(θ y) p(θ)p(y θ), where is a symbol for proportionality, θ is an unknown parameter, y is data, and p(θ), p(θ y) and p(y θ) are the density functions of the prior, posterior and sampling s, respectively. In Bayesian inference, the unknown parameter θ is considered stochastic, unlike in classical inference. The s p(θ) and p(θ y) express uncertainty about the exact value of θ. The density of data, p(y θ), provides information from the data. It is called a likelihood function when considered a function of θ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 3 / 130

6 Software for Bayesian Statistics In this course we use the R and BUGS programming languages. BUGS stands for Bayesian inference Using Gibbs Sampling. Gibbs sampling was the computational technique first adopted for Bayesian analysis. The goal of the BUGS project is to separate the knowledge base from the inference machine used to draw conclusions. BUGS language is able to describe complex using very limited syntax. Introduction to Bayesian analysis, autumn 2013 University of Tampere 4 / 130

7 Software for Bayesian Statistics In this course we use the R and BUGS programming languages. BUGS stands for Bayesian inference Using Gibbs Sampling. Gibbs sampling was the computational technique first adopted for Bayesian analysis. The goal of the BUGS project is to separate the knowledge base from the inference machine used to draw conclusions. BUGS language is able to describe complex using very limited syntax. There are three widely used BUGS implementations: WinBUGS, OpenBUGS and JAGS. Both WinBUGS and OpenBUGS have a Windows GUI. Further, each engine can be controlled from R. In this course we introduce rjags, the R interface to JAGS. Introduction to Bayesian analysis, autumn 2013 University of Tampere 4 / 130

8 Contents of the course Introduction to Bayesian analysis, autumn 2013 University of Tampere 5 / 130

9 Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Introduction to Bayesian analysis, autumn 2013 University of Tampere 6 / 130

10 Bayes theorem Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let A 1,A 2,...,A k be events that partition the sample space Ω, (i.e. Ω = A 1 A 2... A k and A i A j = when i j) and let B an event on that space for which Pr(B) > 0. Then Bayes theorem is Pr(A j B) = Pr(A j )Pr(B A j ) k j=1 Pr(A j)pr(b A j ). Introduction to Bayesian analysis, autumn 2013 University of Tampere 7 / 130

11 Bayes theorem Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let A 1,A 2,...,A k be events that partition the sample space Ω, (i.e. Ω = A 1 A 2... A k and A i A j = when i j) and let B an event on that space for which Pr(B) > 0. Then Bayes theorem is Pr(A j B) = Pr(A j )Pr(B A j ) k j=1 Pr(A j)pr(b A j ). This formula can be used to reverse conditional probabilities. If one knows the probabilities of the events A j and the conditional probabilities Pr(B A j ), j = 1,...,k, the formula can be used to compute the conditinal probabilites Pr(A j B). Introduction to Bayesian analysis, autumn 2013 University of Tampere 7 / 130

12 (Diagnostic tests) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A disease occurs with prevalence γ in population, and θ indicates that an individual has the disease. Hence Pr(θ = 1) = γ, Pr(θ = 0) = 1 γ. A diagnostic test gives a result Y, whose function is F 1 (y) for a diseased individual and F 0 (y) otherwise. The most common type of test declares that a person is diseased if Y > y 0, where y 0 is fixed on the basis of past data. Introduction to Bayesian analysis, autumn 2013 University of Tampere 8 / 130

13 (Diagnostic tests) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A disease occurs with prevalence γ in population, and θ indicates that an individual has the disease. Hence Pr(θ = 1) = γ, Pr(θ = 0) = 1 γ. A diagnostic test gives a result Y, whose function is F 1 (y) for a diseased individual and F 0 (y) otherwise. The most common type of test declares that a person is diseased if Y > y 0, where y 0 is fixed on the basis of past data. The probability that a person is diseased, given a positive test result, is = Pr(θ = 1 Y > y 0 ) γ[1 F 1 (y 0 )] γ[1 F 1 (y 0 )]+(1 γ)[1 F 0 (y 0 )]. Introduction to Bayesian analysis, autumn 2013 University of Tampere 8 / 130

14 (Diagnostic tests) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A disease occurs with prevalence γ in population, and θ indicates that an individual has the disease. Hence Pr(θ = 1) = γ, Pr(θ = 0) = 1 γ. A diagnostic test gives a result Y, whose function is F 1 (y) for a diseased individual and F 0 (y) otherwise. The most common type of test declares that a person is diseased if Y > y 0, where y 0 is fixed on the basis of past data. The probability that a person is diseased, given a positive test result, is = Pr(θ = 1 Y > y 0 ) γ[1 F 1 (y 0 )] γ[1 F 1 (y 0 )]+(1 γ)[1 F 0 (y 0 )]. This is sometimes called the positive predictive value of test. Its sensitivity and specifity are 1 F 1 (y 0 ) and F 0 (y 0 ). ( from Davison, 2003). Introduction to Bayesian analysis, autumn 2013 University of Tampere 8 / 130

15 Prior and posterior s Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction In a more general case, θ can take a finite number of values, labelled 1,..., k. We can assign to these values probabilites p 1,...,p k which express our beliefs about θ before we have access to the data. The data y are assumed to be the observed value of a (multidimensional) random variable Y, and p(y θ) the density of y given θ (the likelihood function). Introduction to Bayesian analysis, autumn 2013 University of Tampere 9 / 130

16 Prior and posterior s Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction In a more general case, θ can take a finite number of values, labelled 1,..., k. We can assign to these values probabilites p 1,...,p k which express our beliefs about θ before we have access to the data. The data y are assumed to be the observed value of a (multidimensional) random variable Y, and p(y θ) the density of y given θ (the likelihood function). Then the conditional probabilites Pr(θ = j Y = y) = p jp(y θ = j) k i=1 p ip(y θ = i), j = 1,...,k, summarize our beliefs about θ after we have observed Y. Introduction to Bayesian analysis, autumn 2013 University of Tampere 9 / 130

17 Prior and posterior s Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction In a more general case, θ can take a finite number of values, labelled 1,..., k. We can assign to these values probabilites p 1,...,p k which express our beliefs about θ before we have access to the data. The data y are assumed to be the observed value of a (multidimensional) random variable Y, and p(y θ) the density of y given θ (the likelihood function). Then the conditional probabilites Pr(θ = j Y = y) = p jp(y θ = j) k i=1 p ip(y θ = i), j = 1,...,k, summarize our beliefs about θ after we have observed Y. The unconditional probabilities p 1,...,p k are called prior probablities and Pr(θ = 1 Y = y),...,pr(θ = k Y = y) are called posterior probabilites of θ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 9 / 130

18 Prior and posterior s (2) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction When θ can get values continuosly on some interval, we can express our beliefs about it with a prior density p(θ). After we have obtained the data y, our beliefs about θ are contained in the conditional density, p(θ y) = p(θ)p(y θ) p(θ)p(y θ)dθ, (1) called posterior density. Introduction to Bayesian analysis, autumn 2013 University of Tampere 10 / 130

19 Prior and posterior s (2) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction When θ can get values continuosly on some interval, we can express our beliefs about it with a prior density p(θ). After we have obtained the data y, our beliefs about θ are contained in the conditional density, p(θ y) = p(θ)p(y θ) p(θ)p(y θ)dθ, (1) called posterior density. Since θ is integrated out in the denominator, it can be considered as a constant with respect to θ. Therefore, the Bayes formula in (1) is often written as p(θ y) p(θ)p(y θ), (2) which denotes that p(θ y) is proportional to p(θ)p(y θ). Introduction to Bayesian analysis, autumn 2013 University of Tampere 10 / 130

20 1 (Introducing a New Drug in the Market) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A drug company would like to introduce a drug to reduce acid indigestion. It is desirable to estimate θ, the proportion of the market share that this drug will capture. The company interviews n people and Y of them say that they will buy the drug. In the non-bayesian analysis θ [0,1] and Y Bin(n,θ). Introduction to Bayesian analysis, autumn 2013 University of Tampere 11 / 130

21 1 (Introducing a New Drug in the Market) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A drug company would like to introduce a drug to reduce acid indigestion. It is desirable to estimate θ, the proportion of the market share that this drug will capture. The company interviews n people and Y of them say that they will buy the drug. In the non-bayesian analysis θ [0,1] and Y Bin(n,θ). We know that ˆθ = Y/n is a very good estimator of θ. It is unbiased, consistent and minimum variance unbiased. Moreover, it is also the maximum likelihood estimator (MLE), and thus asymptotically normal. Introduction to Bayesian analysis, autumn 2013 University of Tampere 11 / 130

22 1 (Introducing a New Drug in the Market) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A drug company would like to introduce a drug to reduce acid indigestion. It is desirable to estimate θ, the proportion of the market share that this drug will capture. The company interviews n people and Y of them say that they will buy the drug. In the non-bayesian analysis θ [0,1] and Y Bin(n,θ). We know that ˆθ = Y/n is a very good estimator of θ. It is unbiased, consistent and minimum variance unbiased. Moreover, it is also the maximum likelihood estimator (MLE), and thus asymptotically normal. A Bayesian may look at the past performance of new drugs of this type. If in the past new drugs tend to capture a proportion between say.05 and.15 of the market, and if all values in between are assumed equally likely, then θ Unif(.05,.15). ( from Rohatgi, 2003). Introduction to Bayesian analysis, autumn 2013 University of Tampere 11 / 130

23 1 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Thus, the prior is given by { 1/( ) = 10, 0.05 θ 0.15 p(θ) = 0, otherwise. Introduction to Bayesian analysis, autumn 2013 University of Tampere 12 / 130

24 1 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Thus, the prior is given by { 1/( ) = 10, 0.05 θ 0.15 p(θ) = 0, otherwise. and the likelihood function by ( ) n p(y θ) = θ y (1 θ) n y. y Introduction to Bayesian analysis, autumn 2013 University of Tampere 12 / 130

25 1 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Thus, the prior is given by { 1/( ) = 10, 0.05 θ 0.15 p(θ) = 0, otherwise. and the likelihood function by ( ) n p(y θ) = θ y (1 θ) n y. y The posterior is { p(θ y) = p(θ)p(y θ) = p(θ)p(y θ)dθ θ y (1 θ) n y θy (1 θ) n y dθ 0.05 θ , otherwise. Introduction to Bayesian analysis, autumn 2013 University of Tampere 12 / 130

26 1 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Suppose that the sample size is n = 100 and y = 20 say that they will use the drug. Then the following BUGS code can be used to simulate the posterior. model{ theta ~ dunif(0.05,0.15) y ~ dbin(theta,n) } Suppose that this is the contents of file Acid.txt at the home directory. Then JAGS can be called from R as follows: acid <- list(n=100,y=20) acid.jag <- jags.model("acid1.txt",acid) acid.coda <- coda.samples(acid.jag,"theta",10000) hist(acid.coda[[1]][,"theta"],main="",xlab=expression(theta)) Introduction to Bayesian analysis, autumn 2013 University of Tampere 13 / 130

27 1 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Frequency Figure 1: Market share of a new drug: Simulations from the posterior of θ. θ Introduction to Bayesian analysis, autumn 2013 University of Tampere 14 / 130

28 2 (Diseased White Pine Trees.) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction White pine is one of the best known species of pines in the northeastern United States and Canada. White pine is susceptible to blister rust, which develops cankers on the bark. These cankers swell, resulting in death of twigs and small trees. A forester wishes to estimate the average number of diseased pine trees per acre in a forest. Introduction to Bayesian analysis, autumn 2013 University of Tampere 15 / 130

29 2 (Diseased White Pine Trees.) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction White pine is one of the best known species of pines in the northeastern United States and Canada. White pine is susceptible to blister rust, which develops cankers on the bark. These cankers swell, resulting in death of twigs and small trees. A forester wishes to estimate the average number of diseased pine trees per acre in a forest. The number of diseased trees per acre can be modeled by a Poisson with mean θ. Since θ changes from area to area, the forester believes that θ Exp(λ). Thus, p(θ) = (1/λ)e θ/λ, if θ > 0,and 0 elsewhere Introduction to Bayesian analysis, autumn 2013 University of Tampere 15 / 130

30 2 (Diseased White Pine Trees.) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction White pine is one of the best known species of pines in the northeastern United States and Canada. White pine is susceptible to blister rust, which develops cankers on the bark. These cankers swell, resulting in death of twigs and small trees. A forester wishes to estimate the average number of diseased pine trees per acre in a forest. The number of diseased trees per acre can be modeled by a Poisson with mean θ. Since θ changes from area to area, the forester believes that θ Exp(λ). Thus, p(θ) = (1/λ)e θ/λ, if θ > 0,and 0 elsewhere The forester takes a random sample of size n from n different one-acre plots. ( from Rohatgi, 2003). Introduction to Bayesian analysis, autumn 2013 University of Tampere 15 / 130

31 2 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction The likelihood function is n p(y θ) = i=1 n i=1 y i θ y i y i! e θ = θ yi! e nθ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 16 / 130

32 2 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction The likelihood function is p(y θ) = n i=1 n i=1 y i θ y i y i! e θ = θ yi! Consequently, the posterior is p(θ y) = θ n i=1 y i e θ(n+1/λ) 0 θ n e nθ. i=1 y ie θ(n+1/λ). We see that this is a Gamma- with parameters α = n i=1 y i +1 and β = n+1/λ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 16 / 130

33 2 (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction The likelihood function is p(y θ) = n i=1 n i=1 y i θ y i y i! e θ = θ yi! Consequently, the posterior is p(θ y) = θ n i=1 y i e θ(n+1/λ) 0 θ n e nθ. i=1 y ie θ(n+1/λ). We see that this is a Gamma- with parameters α = n i=1 y i +1 and β = n+1/λ. Thus, p(θ y) = (n+1/λ) n i=1 y i+1 Γ( n i=1 y θ n i=1 y i e θ(n+1/λ). i +1) Introduction to Bayesian analysis, autumn 2013 University of Tampere 16 / 130

34 Statistical decision theory Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction The outcome of a Bayesian analysis is the posterior, which combines the prior information and the information from data. However, sometimes we may want to summarize the posterior information with a scalar, for example the mean, median or mode of the posterior. In the following, we show how the use of scalar estimator can be justified using statistical decision theory. Let L(θ, ˆθ) denote the loss function which gives the cost of using ˆθ = ˆθ(y) as an estimate for θ. We define that ˆθ is a Bayes estimate of θ if it minimizes the posterior expected loss E[L(θ,ˆθ) y] = L(θ,ˆθ)p(θ y)dθ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 17 / 130

35 Statistical decision theory (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction On the other hand, the expectation of the loss function over the sampling of y is called risk function: Rˆθ(θ) = E[L(θ,ˆθ) θ] = L(θ,ˆθ)p(y θ)dy. Introduction to Bayesian analysis, autumn 2013 University of Tampere 18 / 130

36 Statistical decision theory (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction On the other hand, the expectation of the loss function over the sampling of y is called risk function: Rˆθ(θ) = E[L(θ,ˆθ) θ] = L(θ,ˆθ)p(y θ)dy. Further, the expectation of the risk function over the prior of θ, E[Rˆθ(θ)] = Rˆθ(θ)p(θ)dθ, is called Bayes risk. Introduction to Bayesian analysis, autumn 2013 University of Tampere 18 / 130

37 Statistical decision theory (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction By changing the order of integration one can see that the Bayes risk Rˆθ(θ)p(θ)dθ = p(θ) L(θ,ˆθ)p(y θ)dydθ = p(y) L(θ,ˆθ)p(θ y)dθdy (3) is minimized when the inner integral in (3) is minimized for each y, that is, when a Bayes estimator is used. Introduction to Bayesian analysis, autumn 2013 University of Tampere 19 / 130

38 Statistical decision theory (continued) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction By changing the order of integration one can see that the Bayes risk Rˆθ(θ)p(θ)dθ = p(θ) L(θ,ˆθ)p(y θ)dydθ = p(y) L(θ,ˆθ)p(θ y)dθdy (3) is minimized when the inner integral in (3) is minimized for each y, that is, when a Bayes estimator is used. In the following, we introduce the Bayes estimators for three simple loss functions. Introduction to Bayesian analysis, autumn 2013 University of Tampere 19 / 130

39 Bayes estimators: zero-one loss function Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Zero-one loss: L(θ, ˆθ) = { 0 when ˆθ θ < a 1 when ˆθ θ a. Introduction to Bayesian analysis, autumn 2013 University of Tampere 20 / 130

40 Bayes estimators: zero-one loss function Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Zero-one loss: We should minimize L(θ, ˆθ) = L(θ,ˆθ)p(θ y)dθ = { 0 when ˆθ θ < a 1 when ˆθ θ a. ˆθ a =1 ˆθ+a p(θ y)dθ + ˆθ a p(θ y)dθ, ˆθ+a p(θ y)dθ Introduction to Bayesian analysis, autumn 2013 University of Tampere 20 / 130

41 Bayes estimators: zero-one loss function Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Zero-one loss: We should minimize or maximize L(θ, ˆθ) = L(θ,ˆθ)p(θ y)dθ = { 0 when ˆθ θ < a 1 when ˆθ θ a. ˆθ a =1 ˆθ+a ˆθ a ˆθ+a p(θ y)dθ + ˆθ a p(θ y)dθ. p(θ y)dθ, ˆθ+a p(θ y)dθ Introduction to Bayesian analysis, autumn 2013 University of Tampere 20 / 130

42 Bayes estimators: absolute error loss and quadratic loss function Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction If p(θ y) is unimodal, maximization is achieved by choosing ˆθ to be the midpoint of the interval of length 2a for which p(θ y) has the same value at both ends. If we let a 0, then ˆθ tends to the mode of the posterior. This equals the MLE if p(θ) is flat. Introduction to Bayesian analysis, autumn 2013 University of Tampere 21 / 130

43 Bayes estimators: absolute error loss and quadratic loss function Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction If p(θ y) is unimodal, maximization is achieved by choosing ˆθ to be the midpoint of the interval of length 2a for which p(θ y) has the same value at both ends. If we let a 0, then ˆθ tends to the mode of the posterior. This equals the MLE if p(θ) is flat. Absolute error loss: L(θ, ˆθ) = ˆθ θ. In general, if X is a random variable, then the expectation E( X d ) is minimized by choosing d to be the median of the of X. Thus, the Bayes estimate of θ is the posterior median. Introduction to Bayesian analysis, autumn 2013 University of Tampere 21 / 130

44 Bayes estimators: absolute error loss and quadratic loss function Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction If p(θ y) is unimodal, maximization is achieved by choosing ˆθ to be the midpoint of the interval of length 2a for which p(θ y) has the same value at both ends. If we let a 0, then ˆθ tends to the mode of the posterior. This equals the MLE if p(θ) is flat. Absolute error loss: L(θ, ˆθ) = ˆθ θ. In general, if X is a random variable, then the expectation E( X d ) is minimized by choosing d to be the median of the of X. Thus, the Bayes estimate of θ is the posterior median. Quadratic loss function: L(θ, ˆθ) = (ˆθ θ) 2. In general, if X is a random variable, then the expectation E[(X d) 2 ] is minimized by choosing d to be the mean of the of X. Thus, the Bayes estimate of θ is the posterior mean. Introduction to Bayesian analysis, autumn 2013 University of Tampere 21 / 130

45 Bayes estimators: 1 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction We continue our example of the market share of a new drug. Using R, we can compute the posterior mean and median estimates, and various posterior intervals: summary(acid.coda) 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE Quantiles for each variable: 2.5% 25% 50% 75% 97.5% Introduction to Bayesian analysis, autumn 2013 University of Tampere 22 / 130

46 Bayes estimators: 1 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction From Figure 1 we see that the posterior mode is Introduction to Bayesian analysis, autumn 2013 University of Tampere 23 / 130

47 Bayes estimators: 1 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction From Figure 1 we see that the posterior mode is If we use Beta(α,β), whose density is p(θ) = 1 B(α,β) θα 1 (1 θ) β 1, when 0 < θ < 1, Introduction to Bayesian analysis, autumn 2013 University of Tampere 23 / 130

48 Bayes estimators: 1 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction From Figure 1 we see that the posterior mode is If we use Beta(α,β), whose density is p(θ) = 1 B(α,β) θα 1 (1 θ) β 1, when 0 < θ < 1, as a prior, then the posterior is p(θ y) p(θ)p(y θ) θ α+y 1 (1 θ) β+n y 1. We see immediately that the posterior is Beta(α+y,β +n y). Introduction to Bayesian analysis, autumn 2013 University of Tampere 23 / 130

49 Bayes estimators: 1 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction From Figure 1 we see that the posterior mode is If we use Beta(α,β), whose density is p(θ) = 1 B(α,β) θα 1 (1 θ) β 1, when 0 < θ < 1, as a prior, then the posterior is p(θ y) p(θ)p(y θ) θ α+y 1 (1 θ) β+n y 1. We see immediately that the posterior is Beta(α+y,β +n y). The posterior mean (Bayes estimator with quadratic loss) is (α+y)/(α+β +n). The mode (Bayes estimator with zero-one loss when a 0) is (α+y 1)/(α+β +n 2), provided that the is unimodal. Introduction to Bayesian analysis, autumn 2013 University of Tampere 23 / 130

50 Bayes estimators: 2 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction We now continue our example of estimating the proportion of diseased trees. We derived that the posterior is Gamma( n i=1 y i +1,n+1/λ). Thus, the Bayes estimator with a quadratic loss function is the mean of this, ( n i=1 y i +1)/(n+1/λ). However, the mean and mode of a gamma do not exist in closed form. Introduction to Bayesian analysis, autumn 2013 University of Tampere 24 / 130

51 Bayes estimators: 2 (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction We now continue our example of estimating the proportion of diseased trees. We derived that the posterior is Gamma( n i=1 y i +1,n+1/λ). Thus, the Bayes estimator with a quadratic loss function is the mean of this, ( n i=1 y i +1)/(n+1/λ). However, the mean and mode of a gamma do not exist in closed form. Note that the classical estimate for θ is the sample mean ȳ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 24 / 130

52 Conjugate prior Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Computations can often be facilitated using conjugate prior s. We say that a prior is conjugate for the likelihood if the prior and posterior s belong to the same family. There are conjugate s for the exponential family of sampling s. Introduction to Bayesian analysis, autumn 2013 University of Tampere 25 / 130

53 Conjugate prior Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Computations can often be facilitated using conjugate prior s. We say that a prior is conjugate for the likelihood if the prior and posterior s belong to the same family. There are conjugate s for the exponential family of sampling s. Conjugate priors can be formed with the following simple steps: 1. Write the likelihood function. 2. Remove the factors that do not depend on θ. 3. Replace the expressions which depend on data with parameters. Also the sample size n should be replaced. 4. Now you have the kernel of the conjugate prior. You can complement it with the normalizing constant. 5. In order to obtain the standard parametrization it may be necessary to reparametrize. Introduction to Bayesian analysis, autumn 2013 University of Tampere 25 / 130

54 : Poisson likelihood Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let y = (y 1,...y n ) be a sample from Poi(θ). Then the likelihood is n θ y ie θ p(y θ) = θ y i e nθ. y i! i=1 Introduction to Bayesian analysis, autumn 2013 University of Tampere 26 / 130

55 : Poisson likelihood Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let y = (y 1,...y n ) be a sample from Poi(θ). Then the likelihood is n θ y ie θ p(y θ) = θ y i e nθ. y i! i=1 By replacing y i and n, which depend on the data, with the parameters α 1 and α 2, we obtain the conjugate prior p(θ) θ α 1 e α 2θ, which is Gamma(α 1 +1,α 2 ). If we reparametrize this so that α = α 1 +1 and β = α 2 we obtain the prior Gamma(α, β). Introduction to Bayesian analysis, autumn 2013 University of Tampere 26 / 130

56 : Uniform likelihood Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Assume that y = (y 1,...,y n ) is a random sample from Unif(0,θ). The the density of a single observation y i is { 1 p(y i θ) = θ 0 y i θ, 0, otherwise, Introduction to Bayesian analysis, autumn 2013 University of Tampere 27 / 130

57 : Uniform likelihood Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Assume that y = (y 1,...,y n ) is a random sample from Unif(0,θ). The the density of a single observation y i is { 1 p(y i θ) = θ 0 y i θ, 0, otherwise, and the likelihood of θ is { 1 p(y θ) = θ, 0 y n (1)... y (n) θ, 0, otherwise, = 1 θ ni {y (n) θ}(y) I {y(1) 0}(y), where I A (y) denotes an indicator function obtaining value 1 when y A and 0 otherwise. Introduction to Bayesian analysis, autumn 2013 University of Tampere 27 / 130

58 : Uniform likelihood (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Now, by removing the factor I {y(1) 0}(y), which does not depend on θ, and replacing n and y (n) with parameters we obtain p(θ) 1 θ αi {θ β}(θ) { 1 = θ, when θ β, α 0, otherwise. This is the kernel of the Pareto. Introduction to Bayesian analysis, autumn 2013 University of Tampere 28 / 130

59 : Uniform likelihood (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Now, by removing the factor I {y(1) 0}(y), which does not depend on θ, and replacing n and y (n) with parameters we obtain p(θ) 1 θ αi {θ β}(θ) { 1 = θ, when θ β, α 0, otherwise. This is the kernel of the Pareto.The posterior p(θ y) p(θ)p(y θ) { 1, when θ max(β,y θ n+α (n) ) 0, otherwise. is also a Pareto. Introduction to Bayesian analysis, autumn 2013 University of Tampere 28 / 130

60 Noninformative prior Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction When there is no prior information available on the estimated parameters, noninformative priors can be used. They can also be used to find out how an informative prior affects the outcome of the inference. Introduction to Bayesian analysis, autumn 2013 University of Tampere 29 / 130

61 Noninformative prior Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction When there is no prior information available on the estimated parameters, noninformative priors can be used. They can also be used to find out how an informative prior affects the outcome of the inference. The uniform p(θ) 1 is often used as a noninformative prior. However, this is not fully unproblematic. If the uniform is restricted to an interval, it is not, in fact, noninformative. For example, the prior Unif(0, 1), contains the information that θ is in the interval [0.2,0.4] with probability 0.2. This information content becomes obvious when a parametric transformation is made. The of the transformed parameter is no more uniform. Introduction to Bayesian analysis, autumn 2013 University of Tampere 29 / 130

62 Noninformative prior (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Another problem arises if the parameter can obtain values in an infinite interval. In such a case there is no proper uniform. However, one can use an improper uniform prior. Then the posterior is proportional to the likelihood. Introduction to Bayesian analysis, autumn 2013 University of Tampere 30 / 130

63 Noninformative prior (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Another problem arises if the parameter can obtain values in an infinite interval. In such a case there is no proper uniform. However, one can use an improper uniform prior. Then the posterior is proportional to the likelihood. Some parameters, for example scale parameteres and variances, can obtain only positive values. Such variables are often given the improper prior p(θ) 1/θ, which implies that log(θ) has a uniform prior. Introduction to Bayesian analysis, autumn 2013 University of Tampere 30 / 130

64 Noninformative prior (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Another problem arises if the parameter can obtain values in an infinite interval. In such a case there is no proper uniform. However, one can use an improper uniform prior. Then the posterior is proportional to the likelihood. Some parameters, for example scale parameteres and variances, can obtain only positive values. Such variables are often given the improper prior p(θ) 1/θ, which implies that log(θ) has a uniform prior. Jeffreys has suggested giving a uniform prior for such a transformation of θ that its Fisher information is a constant. Jeffreys prior is defined as p(θ) I(θ) 1 2, where I(θ) is the Fisher information of θ. That this definition is invariant to parametrization, can be seen as follows: Introduction to Bayesian analysis, autumn 2013 University of Tampere 30 / 130

65 Noninformative prior (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let φ = h(θ) be a regular, monotonic transformation of θ, and its inverse transformation θ = h 1 (φ). Then the Fisher information of φ is [ (dlogp(y φ) ) ] 2 I(φ) =E dφ φ [ (dlogp(y θ = h 1 ) 2 ] (φ)) =E dθ φ dθ 2 dφ =I(θ) dθ 2 dφ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 31 / 130

66 Noninformative prior (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let φ = h(θ) be a regular, monotonic transformation of θ, and its inverse transformation θ = h 1 (φ). Then the Fisher information of φ is [ (dlogp(y φ) ) ] 2 I(φ) =E dφ φ [ (dlogp(y θ = h 1 ) 2 ] (φ)) =E dθ φ dθ 2 dφ =I(θ) dθ 2 dφ. Thus, I(φ) 1 2 = I(Θ) 1 2 dθ. dφ Introduction to Bayesian analysis, autumn 2013 University of Tampere 31 / 130

67 Noninformative prior (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Let φ = h(θ) be a regular, monotonic transformation of θ, and its inverse transformation θ = h 1 (φ). Then the Fisher information of φ is [ (dlogp(y φ) ) ] 2 I(φ) =E dφ φ [ (dlogp(y θ = h 1 ) 2 ] (φ)) =E dθ φ dθ 2 dφ =I(θ) dθ 2 dφ. Thus, I(φ) 1 2 = I(Θ) 1 2 dθ dφ. On the other hand, p(φ) = p(θ) dθ = I(Θ) 1 2 dθ, as required. Introduction to Bayesian analysis, autumn 2013 University of Tampere 31 / 130 dφ dφ

68 Jeffreys prior: s Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Binomial The Fisher information of the binomial parameter θ is I(θ) = n/[(θ(1 θ)]. Thus, the Jeffreys prior is p(θ) [θ(1 θ)] 1/2, which is the Beta(1/2,1/2). Introduction to Bayesian analysis, autumn 2013 University of Tampere 32 / 130

69 Jeffreys prior: s Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Binomial The Fisher information of the binomial parameter θ is I(θ) = n/[(θ(1 θ)]. Thus, the Jeffreys prior is p(θ) [θ(1 θ)] 1/2, which is the Beta(1/2,1/2). The mean of the normal The Fisher information for the mean θ of the normal is I(θ) = n/σ 2. This is independent of θ, so that Jeffreys prior is constant, p(θ) 1. Introduction to Bayesian analysis, autumn 2013 University of Tampere 32 / 130

70 Jeffreys prior: s Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Binomial The Fisher information of the binomial parameter θ is I(θ) = n/[(θ(1 θ)]. Thus, the Jeffreys prior is p(θ) [θ(1 θ)] 1/2, which is the Beta(1/2,1/2). The mean of the normal The Fisher information for the mean θ of the normal is I(θ) = n/σ 2. This is independent of θ, so that Jeffreys prior is constant, p(θ) 1. The variance of the normal Assume that the variance θ of the normal N(µ,θ) is unknown. Then its Fisher information is I(θ) = n/(2θ 2 ), and Jeffreys prior p(θ) 1/θ. Introduction to Bayesian analysis, autumn 2013 University of Tampere 32 / 130

71 Posterior intervals Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Whe have seen that it is possible to summarize posterior information using point estimators. However, posterior regions and intervals are usually more useful. We define that a set C is a posterior region of level 1 α for θ if the posterior probability of θ belonging to C is 1 α: Pr(θ C y) = p(θ y)dθ = 1 α. C Introduction to Bayesian analysis, autumn 2013 University of Tampere 33 / 130

72 Posterior intervals Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Whe have seen that it is possible to summarize posterior information using point estimators. However, posterior regions and intervals are usually more useful. We define that a set C is a posterior region of level 1 α for θ if the posterior probability of θ belonging to C is 1 α: Pr(θ C y) = p(θ y)dθ = 1 α. C In the case of scalar parameters one can use posterior intervals (credible intervals). An equi-tailed posterior inteval is defined using quantiles of the posterior. Thus, (θ L,θ U ) is an 100(1 α)% interval if Pr(θ < θ L y) = Pr(θ > θ U y) = α/2. Introduction to Bayesian analysis, autumn 2013 University of Tampere 33 / 130

73 Posterior intervals Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Whe have seen that it is possible to summarize posterior information using point estimators. However, posterior regions and intervals are usually more useful. We define that a set C is a posterior region of level 1 α for θ if the posterior probability of θ belonging to C is 1 α: Pr(θ C y) = p(θ y)dθ = 1 α. C In the case of scalar parameters one can use posterior intervals (credible intervals). An equi-tailed posterior inteval is defined using quantiles of the posterior. Thus, (θ L,θ U ) is an 100(1 α)% interval if Pr(θ < θ L y) = Pr(θ > θ U y) = α/2. An advantage of this type of interval is that it is invariant with respect to one-to-one parameter transformations. Further, it is easy to compute. Introduction to Bayesian analysis, autumn 2013 University of Tampere 33 / 130

74 Posterior intervals (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A posterior region is said to be a highest posterior density region (HPD region) if the posterior density is larger in all points of the region than in any point outside the region. This type of region has the smallest possible volume. In a scalar case, an HPD interval has the smallest length. On the other hand, the bounds of the interval are not invariant with respect to parameter transformations, and it is not always easy to determine them. Introduction to Bayesian analysis, autumn 2013 University of Tampere 34 / 130

75 Posterior intervals (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction A posterior region is said to be a highest posterior density region (HPD region) if the posterior density is larger in all points of the region than in any point outside the region. This type of region has the smallest possible volume. In a scalar case, an HPD interval has the smallest length. On the other hand, the bounds of the interval are not invariant with respect to parameter transformations, and it is not always easy to determine them.. Cardiac surgery data. Table 1 shows mortality rates for cardiac surgery on babies at 12 hospitals. If one wishes to estimate the mortality rate in hospital A, denoted as θ A, the simpliest approach is to assume that the number of deaths y is binomially distributed with parameters n and θ A where n is the number of operations in A. Then the MLE is ˆθ A = 0, which sounds too optimistic. Introduction to Bayesian analysis, autumn 2013 University of Tampere 34 / 130

76 Posterior intervals (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction If we give a uniform prior for θ A, then the posterior is Beta(1,48), with posterior mean 1/49. The 95% HPD interval is (0,6.05)% and equi-tailed interval (0.05,7.30)%. Figure 2 shows the posterior density. Another approach would use the total numbers of deaths and operations in all hospitals. Table 1: Mortality rates y/n from cardiac surgery in 12 hospitals (Spiegelhalter et. al, BUGS 0.5 s Volume 1, Cambridge: MRC Biostatistics Unit, 1996). The numbers of deaths y out of n operations. A 0/47 B 18/148 C 8/119 D 46/810 E 8/211 F 13/196 G 9/148 H 31/215 I 14/207 J 8/97 K 29/256 L 24/360 Introduction to Bayesian analysis, autumn 2013 University of Tampere 35 / 130

77 Posterior intervals (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction p(,θ y) Figure 2: Posterior density of θ A when the prior is uniform. The 95% HPD interval is indicated with vertical lines and 95% equitailed interval with red colour. Introduction to Bayesian analysis, autumn 2013 University of Tampere 36 / 130 θ

78 Posterior intervals (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction The following BUGS and R codes can be used to compute the equi-tailed and HPD intervals: model{ theta ~ dbeta(1,1) y ~ dbin(theta,n) } hospital <- list(n=47,y=0) hospital.jag <- jags.model("hospital.txt",hospital) hospital.coda <- coda.samples(hospital.jag,"theta",10000) summary(hospital.coda) HPDinterval(hospital.coda) #Compare with exact upper limit of HPD interval: qbeta(0.95,1,48) [1] Introduction to Bayesian analysis, autumn 2013 University of Tampere 37 / 130

79 Posterior predictive Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction If we wish to predict a new observation ỹ on the basis of the sample y = (y 1,...y n ), we may use its posterior predictive. This is defined to be the conditional of ỹ given y: p(ỹ y) = p(ỹ, θ y)dθ = p(ỹ y, θ)p(θ y)dθ, where p(ỹ y, θ) is the density of the predictive. Introduction to Bayesian analysis, autumn 2013 University of Tampere 38 / 130

80 Posterior predictive Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction If we wish to predict a new observation ỹ on the basis of the sample y = (y 1,...y n ), we may use its posterior predictive. This is defined to be the conditional of ỹ given y: p(ỹ y) = p(ỹ, θ y)dθ = p(ỹ y, θ)p(θ y)dθ, where p(ỹ y, θ) is the density of the predictive. It is easy to simulate the posterior predictive. First, draw simulations θ 1,...,θ L from the posterior p(θ y), then, for each i, draw ỹ i from the predictive p(ỹ y,θ i ). Introduction to Bayesian analysis, autumn 2013 University of Tampere 38 / 130

81 Posterior predictive : Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Assume that we have a coin with unknown probability θ of a head. If there occurs y heads among the first n tosses what is the probability of a head on the next throw? Introduction to Bayesian analysis, autumn 2013 University of Tampere 39 / 130

82 Posterior predictive : Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Assume that we have a coin with unknown probability θ of a head. If there occurs y heads among the first n tosses what is the probability of a head on the next throw? Let ỹ = 1 (ỹ = 0) indicate the event that the next throw is a head (tail). If the prior of θ is Beta(α,β), then p(ỹ y) = = = p(ỹ y, θ)p(θ y)dθ θỹ(1 θ) 1 ỹθα+y 1 (1 θ) β+n y 1 0 B(α+y,β +n y) dθ B(α+y +ỹ,β +n y ỹ +1) B(α+y,β +n y) = (α+y)ỹ(β +n y) 1 ỹ α+β +n Introduction to Bayesian analysis, autumn 2013 University of Tampere 39 / 130.

83 Posterior predictive : (cont) Bayes theorem Prior and posterior s 1 2 Decision theory Bayes estimators 1 2 Conjugate priors Noninformative priors Intervals Prediction Thus, Pr(ỹ = 1 y) = (α+y)/(α+β +n). This tends to the sample proportion y/n as n, so that the role of the prior information vanishes. If n = 10 and y = 4 and prior parameters α = β = 0.5 (Jeffreys prior), the posterior predictive can be simulated with BUGS as follows: model{ theta ~ dbeta(alpha,beta) y ~ dbin(theta,n) ynew ~ dbern(theta) } coin <- list(n=10,y=4,alpha=0.5,beta=0.5) coin.jag <- jags.model("coin.txt",coin) coin.coda <- coda.samples(coin.jag,c("theta","ynew"),10000) summary(coin.coda) Introduction to Bayesian analysis, autumn 2013 University of Tampere 40 / 130

84 Normal Poisson Exponential Introduction to Bayesian analysis, autumn 2013 University of Tampere 41 / 130

85 Normal with known variance Normal Poisson Exponential Next we will consider some simple single-parameter. Let us first assume that y = (y 1,...y n ) is a sample from a normal unknown mean θ and known variance σ 2. The likelihood is then p(y θ) = n i=1 1 2πσ 2 e 1 2σ 2(y i θ) 2 e 1 2σ 2 n i=1 (y i θ) 2 e n 2σ 2(θ ȳ)2. Introduction to Bayesian analysis, autumn 2013 University of Tampere 42 / 130

86 Normal with known variance Normal Poisson Exponential Next we will consider some simple single-parameter. Let us first assume that y = (y 1,...y n ) is a sample from a normal unknown mean θ and known variance σ 2. The likelihood is then p(y θ) = n i=1 1 2πσ 2 e 1 2σ 2(y i θ) 2 e 1 2σ 2 n i=1 (y i θ) 2 e n 2σ 2(θ ȳ)2. By replacing σ 2 /n with τ0 2, and ȳ with µ 0, we find a conjugate prior p(θ) e 1 2τ 0 2 (θ µ 0 ) 2, which is N(µ 0,τ 2 0 ). Introduction to Bayesian analysis, autumn 2013 University of Tampere 42 / 130

87 Normal with known variance (cont) Normal Poisson Exponential With this prior the posterior becomes p(θ y) p(θ)p(y θ) e 1 2τ 2 0 (θ µ 0 ) 2 e n 2σ 2(θ ȳ)2 { exp 1 ( 1 2 τ0 2 + n ) ( σ 2 θ 2 2 { exp 1 } 2τn 2 (θ µ n ) 2, 1 τ 2 0 µ 0 + n )} ȳ σ 2 + n θ σ 2 1 τ 2 0 Introduction to Bayesian analysis, autumn 2013 University of Tampere 43 / 130

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Lecture 6. Prior distributions

Lecture 6. Prior distributions Summary Lecture 6. Prior distributions 1. Introduction 2. Bivariate conjugate: normal 3. Non-informative / reference priors Jeffreys priors Location parameters Proportions Counts and rates Scale parameters

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Prior Choice, Summarizing the Posterior

Prior Choice, Summarizing the Posterior Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Conjugate Priors, Uninformative Priors

Conjugate Priors, Uninformative Priors Conjugate Priors, Uninformative Priors Nasim Zolaktaf UBC Machine Learning Reading Group January 2016 Outline Exponential Families Conjugacy Conjugate priors Mixture of conjugate prior Uninformative priors

More information

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

Predictive Distributions

Predictive Distributions Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

A primer on Bayesian statistics, with an application to mortality rate estimation

A primer on Bayesian statistics, with an application to mortality rate estimation A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington Outline Subjective probability Practical aspects Application to mortality rate estimation

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Introduction to Bayesian Statistics Dimitris Fouskakis Dept. of Applied Mathematics National Technical University of Athens Greece fouskakis@math.ntua.gr M.Sc. Applied Mathematics, NTUA, 2014 p.1/104 Thomas

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

MTH709U/MTHM042 Bayesian Statistics. Lawrence Pettit Queen Mary, Spring 2012

MTH709U/MTHM042 Bayesian Statistics. Lawrence Pettit Queen Mary, Spring 2012 1 MTH709U/MTHM042 Bayesian Statistics Lawrence Pettit Queen Mary, Spring 2012 2 Contents 1 Introduction 7 1.1 Bayes theorem........................... 7 1.2 The Likelihood Principle......................

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Bayesian SAE using Complex Survey Data Lecture 1: Bayesian Statistics

Bayesian SAE using Complex Survey Data Lecture 1: Bayesian Statistics Bayesian SAE using Complex Survey Data Lecture 1: Bayesian Statistics Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 101 Outline Motivation Bayesian Learning Probability

More information

Likelihood and Bayesian Inference for Proportions

Likelihood and Bayesian Inference for Proportions Likelihood and Bayesian Inference for Proportions September 18, 2007 Readings Chapter 5 HH Likelihood and Bayesian Inferencefor Proportions p. 1/24 Giardia In a New Zealand research program on human health

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling 2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

EXERCISES FOR SECTION 1 AND 2

EXERCISES FOR SECTION 1 AND 2 EXERCISES FOR SECTION AND Exercise. (Conditional probability). Suppose that if θ, then y has a normal distribution with mean and standard deviation σ, and if θ, then y has a normal distribution with mean

More information

Likelihood and Bayesian Inference for Proportions

Likelihood and Bayesian Inference for Proportions Likelihood and Bayesian Inference for Proportions September 9, 2009 Readings Hoff Chapter 3 Likelihood and Bayesian Inferencefor Proportions p.1/21 Giardia In a New Zealand research program on human health

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates Chapter 4 Bayesian inference The posterior distribution π(θ x) summarises all our information about θ to date. However, sometimes it is helpful to reduce this distribution to a few key summary measures.

More information

Bayesian inference: an introduction

Bayesian inference: an introduction Bayesian inference: an introduction Peter Green School of Mathematics University of Bristol 8/9 September 2011 / MLSS 2011, Bordeaux Green (Bristol) Bayesian inference MLSS, September 2011 1 / 74 Outline

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Using Probability to do Statistics.

Using Probability to do Statistics. Al Nosedal. University of Toronto. November 5, 2015 Milk and honey and hemoglobin Animal experiments suggested that honey in a diet might raise hemoglobin level. A researcher designed a study involving

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem 2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference Congreso Latinoamericano de Estadística Bayesiana July 1-4, 2015 Probability vs. Statistics What this course is Probability: Statistics: Unknown p(x θ) Known What is the number of heads in 6 tosses of

More information

The comparative studies on reliability for Rayleigh models

The comparative studies on reliability for Rayleigh models Journal of the Korean Data & Information Science Society 018, 9, 533 545 http://dx.doi.org/10.7465/jkdi.018.9..533 한국데이터정보과학회지 The comparative studies on reliability for Rayleigh models Ji Eun Oh 1 Joong

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information