Bernoulli and Poisson models

Size: px
Start display at page:

Download "Bernoulli and Poisson models"

Transcription

1 Bernoulli and Poisson models

2 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule gives the posterior p( y 1,...,y n )= P y i (1 ) n p(y 1,...,y n ) P yi p( )

3 Posterior odds ratio If we compare the relative probabilities of any two -values, say and, we see that a b p( a y 1,...,y n ) p( a y 1,...,y n ) b P = yi a (1 a ) n P y i p( a )/p(y 1,...,y n ) P yi (1 b b ) n P y i p( b )/p(y 1,...,y n ) P y P i n a 1 yi a p( a ) = b 1 b p( b ) This shows that the probability density at relative to P that at depends on y b 1,...,y n only through yi a

4 Sufficient statistics From this it is possible to show that P ( 2 A Y 1 = y 1,...,Y n = y n ) nx = P 2 A Y i = i=1 nx i=1 y i! We interpret this as meaning that P yi contains all the information about available from the data P yi we say that is a sufficient statistic for and p(y 1,...,y n ) sufficient because it is sufficient to know sum(y) in order to make inference about theta

5 The Binomial model When iid Y 1,...,Y n P Bin(1, ) we saw that the n Bin(n, ) i=1 Y i sufficient statistic has a distribution Having observed {Y = y} our task is to obtain the posterior distribution of : likelihood principle at work here p( y) = p(y )p( ) p(y) = n y y (1 ) n y p( ) p(y) = c(y) y (1 ) n y p( ) where c(y) is a function of y and not

6 A uniform prior The parameter 0 and 1 is some unknown number between we shall see that this is not the same as having no prior information Suppose our prior information for is such that all subintervals of [0, 1] having the same length also have the same probability P (a apple apple b) =P (a + c apple apple b + c), for 0 apple a<b<b+ c apple 1 This condition implies a uniform density for : p( ) = 1 for all 2 [0, 1]

7 Normalizing constant Using the following result from calculus Z 1 0 a 1 (1 ) b 1 d = (a) (b) (a + b) where (n) =(n 1)! is evaluated in R as gamma(n) we can find out what Z 1 c(y) is under the uniform prior 0 p( y) d =1 ) do on board BEFORE pressing enter (n + 2) c(y) = (y + 1) (n y + 1)

8 Beta posterior So the posterior distribution is p( y) = = To wrap up: (n + 2) (y + 1) (n y + 1) y (1 ) n y (n + 2) (y + 1) (n y + 1) (y+1) 1 (1 ) (n y+1) 1 = Beta(y +1,n y + 1) a uniform prior, plus a Bernoulli/Binomial likelihood (sampling model) gives a Beta posterior

9 Example: Happiness Each female of age 65 or over in the 1998 General Social Survey was asked whether or not they were generally happy Let Y i =1if respondent i reported being generally happy, and Y i =0otherwise, for i =1,...,n= 129 individuals Since 129 N, the total size of the female senior citizen population, our joint beliefs about Y 1,...,Y 129 are well approximated by (the sampling model) our beliefs about = P N i=1 Y i/n the model that, conditional on, the Y i s are IID Bernoulli RVs with expectation

10 Example: Happiness IID Bernoulli likelihood This sampling model says that the probability for any potential outcome {y 1,...,y n }, conditional on, is given by p(y 1,...,y 129 ) = P 129 i=1 y i (1 ) 129 P 129 i=1 y i The survey revealed that 118 individuals report being generally happy (91%) 11 individuals do not (9%) So the probability of these data for a given value of p(y 1,...,y 129 ) = 118 (1 ) 11 is

11 Example: Happiness Binomial likelihood We could work with this IID Bernoulli likelihood directly. But for a sense of the bigger picture, we ll use that we know that nx Y = Y i Bin(n, ) i=1 In this case, our observed data is y = 118 So (now) the probability of these data for a given value of is 129 p(y ) = (1 ) 11 Notice that this is within a constant factor of the previous Bernoulli likelihood so the likelihood principle applies

12 Example: Happiness Posterior If we take a uniform prior for, expressing our ignorance, then we know the posterior is p( y) = Beta(y +1,n y + 1) For n = 129 and y = 118, this gives {Y = 118} Beta(119, 12) density posterior prior go to R for plotting code BEFORE pressing enter theta

13 Uniform is Beta The uniform prior has p( ) =1for all 2 [0, 1] This can be thought of as a Beta distribution with parameters a =1,b=1 p( ) = (2) (1) (1) 1 1 (1 ) 1 1 = (under the convention that (1) = 1) Beta(1, b 1) add the # of 1s to a and the # of 0s to We saw that if Y Bin(n, ) then {Y = y} Beta(1 + y, 1+n y)

14 General Beta prior Does this result hold for arbitrary Beta priors? Suppose Beta(a, b) and Y Bin(n, ) work out on board BEFORE pressing enter p( y) / a+y 1 (1 ) b+n y 1 ) y Beta(a + y, b + n y) In other words, the constant of proportionality must be: c(n, y, a, b) = (a + b + n) (a + y) (b + n y) How do I know? p( y) (and the beta density) must integrate to 1 they have the same shape (kernel), therefore they must have the same scale (norm const)

15 Posterior proportionality We will use this trick over and over again! I.e., if we recognize that the posterior distribution is proportional to a known probability density, then it must be identical to that density But Careful! The constant of proportionality must be constant with respect to

16 Conjugacy We have shown that a beta prior distribution and a binomial sampling model lead to a beta posterior distribution To reflect this, we say that the class of beta priors is conjugate for the binomial sampling model Formally, we say that a class P of prior distributions for is conjugate for a sampling model p(y ) if p( ) 2 P ) p( y) 2 P Conjugate priors make posterior calculations easy, but might not actually represent our prior information

17 Posterior summaries If we are interested in point-summaries of our posterior inference, then the (full) distribution gives us many points to choose from E.g., if {Y = y} Beta(a + y, b + n y) then E{ y} = a + y a + b + n mode( y) = a + y 1 a + b + n 2 Var[ y] = E{ y}e{1 y} a + b + n +1 not including quantiles, etc.

18 Combining information The posterior expectation E{ y} is easily recognized as a combination of prior and data information: E{ y} = a + b a + b + n work out on board BEFORE pressing enter prior expectation + n a + b + n data average I.e., for this model and prior distribution, the posterior mean is a weighted average of the prior mean and the sample average with weights proportional to a + b and n respectively

19 Prior sensitivity This leads to the interpretation of a and b as prior data : a prior number of 1 s b prior number of 0 s So when we take a uniform prior we are actually using 2 data points! Not noninformative! a + b prior sample size [data size] If n [prior size] a + b, then it seems reasonable that most of our information should be coming from the data as opposed to the prior clearly, in the limit the prior has no influence on the posterior Indeed: a + b a + b + n 0 E{ y} y n & Var[ y] 1 n y 1 y n n

20 Prediction An important feature of Bayesian inference is the existence of a predictive distribution for new observations Revert, for the moment, to our notation for Bernoulli data. Let y 1,...,y n, be the outcomes of a sample of n binary RVs Let Ỹ 2 {0, 1} be an additional outcome from the same population that has yet to be observed The predictive distribution of Ỹ is the conditional distribution of Ỹ given {Y 1 = y 1,...,Y n = y n }

21 Predictive distribution For conditionally IID binary variables the predictive distribution can be derived from the distribution of Ỹ given and the posterior distribution of work out on board BEFORE pressing enter p(ỹ =1 y 1,...,y n )=E{ y 1,...,y n } = a + P N i=1 y i a + b + n p(ỹ =0 y 1,...,y n )=1 p(ỹ =1 y 1,...,y n ) b + n P N i=1 y i a + b + n

22 Two points about prediction 1. The predictive distribution does not depend upon any unknown quantities if it did, we would not be able to use it to make predictions 2. The predictive distribution depends on our observed data i.e. Ỹ is not independent of Y 1,...,Y n this is because observing Y 1,...,Y n gives information about Ỹ, which in turn gives information about if would be bad if Y-tilde were indep of Y -- we would never learn anything!

23 Example: Posterior predictive Consider a uniform prior, or Y = P n i=1 y equivalent to one 1 and one 0 i P (Ỹ Beta(1, 1), using =1 Y = y) =E{ Y = y} = 2 2+n n 2+n y n but mode( Y = y) = y n Does this discrepancy between these two posterior summaries of our information make sense? Consider the case in which Y =0, for which mode( Y = 0) = 0 but P (Ỹ =1 Y = 0) = 1/(2 + n)

24 Confidence regions An interval [l(y),u(y)], based on the observed data Y = y, has 95% Bayesian coverage for if P (l(y) < <u(y) Y = y) =0.95 The interpretation of this interval is that it describes your information about the true value of after you have observed Y = y Such intervals are typically called credible intervals, to distinguish them from frequentist confidence intervals which is an interval that describes a region (based upon ˆ ) wherein the true 0 lies 95% of the time Both, confusingly, use the acronym CI dwell on this distinction, and interpretation

25 Quantile-based (Bayesian) CI Perhaps the easiest way to obtain a credible interval is to use the posterior quantiles To make a 100 (1 ) % quantile-based CI, find numbers /2 < 1 /2 such that (1) P ( < /2 Y = y) = /2 (2) P ( > 1 /2 Y = y) = /2 The numbers /2, 1 /2 are the /2 and 1 /2 posterior quantiles of justify on board BEFORE next slide

26 Example: Binomial sampling and uniform prior Suppose out of n = 10 conditionally independent draws of a binary random variable we observe Y =2 ones Using a uniform prior distribution for, the posterior distribution is {Y =2} Beta(1 + 2, 1 + 8) A 95% CI can be obtained from the and quantiles of this beta distribution calculate in R BEFORE pressing return These quantiles are 0.06 and 0.52, respectively, and so the posterior probability that is 95% 2 [0.06, 0.52] make plot in R BEFORE pressing return

27 Example: a quantile-based CI drawback Notice that there are -values outside the quantilebased CI that have higher probability [density] than some points inside the interval density posterior 95% CI many valid CIs based on quantiles: perhaps this should be called a central quantile based CI theta

28 HPD region The quantile-based drawback suggests a more restrictive type of interval may be warranted A 100 (1 ) % high posterior density (HPD) region consists of a subset of the parameter space such that 1. P ( 2 s(y) Y = y) =1 2. If a 2 s(y), and b /2 s(y), then p( a Y = y) >p( b Y = y) All points in an HPD region have a higher posterior density than points outside the region

29 Constructing a HPD region However, a HPD region might not be an interval if the posterior density is multimodal (having multiple peaks) another drawback is that it is not invariant to 1-1 transformations of the estimand When it is an interval, the basic idea behind its construction is as follows: 1. Gradually move a horizontal line down across the density, including in the HPD region all of the -values having a density above the horizontal line 2. Stop when the posterior probability of the -values in the region reaches 1

30 Example: HPD region Numerically, we may collect the highest density points whose cumulative density is greater than 1 density go to R code BEFORE pressing enter 95% q CI 50% HPD 75% HPD 95% HPD theta more precise The 95% HPD region is [0.04, 0.048] is narrower than the quantile-based CI, yet both contain 95% probability

31 Example: Estimating the probability of a female birth The proportion of births that are female has long been a topic of interest both scientifically and to the lay public E.g., Laplace, perhaps the first Bayesian, analyzed Parisian birth data from 1745 with a uniform prior and a binomial sampling model Bayes calculus wasn t so good; Laplace re-discovered Bayesian stats 241,945 girls, and 251,527 boys

32 Example: female birth posterior The posterior distribution for is (n = ) {Y = } Beta( , ) Now, we can use the posterior CDF to calculate P ( > 0.5 Y = ) = Laplace did this without R pbeta(0.5, , , lower.tail=false) so we can confidently say that the proportion of femaie births in Paris in 1745 was <50% =

33 Example: female birth with placenta previa Placenta previa is an unusual condition of pregnancy in which the placenta is implanted very low in the uterus, obstructing the fetus from a normal vaginal delivery An early study concerning the sex of placenta previa births in Germany found that of a total of 980 births, 437 were female How much evidence does this provide for the claim that the proportion of the female births in the population of placenta previa births is less than 0.485, the proportion of female births in the general population?

34 Example: female birth with placenta previa The posterior distribution is {Y = 437} Beta( , ) The summary statistics are E{ {Y = 437}} =0.446 p Var[ {Y = 437}] =0.016 (sd) normal approximation? -- olden days! med( {Y = 437}) =0.446 The central 95% CI is [0.415, 0.476] go to R BEFORE pressing enter P ( > Y = 437) = 0.007don t forget full posterior plot in R

35 The Poisson model Some measurements, such as a person s number of children or number of friends, have values that are whole numbers In these cases the sample space is Y = {0, 1, 2,...} Perhaps the simplest probability model on Poisson model Y is the A RV Y has a Poisson distribution with mean if P (Y = y ) = y e y! for y 2 {0, 1, 2,...}

36 Mean variance relationship Poisson RVs have the property that E{Y } = Var[Y ] = People sometimes say that the Poisson family of distributions has a mean variance relationship because if one Poisson distribution has a larger mean than another, it will have a larger variance as well

37 IID Sampling model If we take is: Y 1,...,Y n iid Pois( ) then the joint PDF 37

38 Posterior odds ratio Comparing two values of a posteriori, we have P p( a y 1,...,y n ) p( b y 1,...,y n ) = c(y 1,...,y n )e n a yi a p( a ) P c(y 1,...,y n )e n b yi b p( b ) = e n a e n b a b P y i p( a) p( b ) P n As in the case of the IID Bernoulli model, i=1 Y i, is a sufficient statistic Furthermore P n i=1 Y i Pois(n )

39 Conjugate prior Recall that a class of priors is conjugate for a sampling model if the posterior is also in that class For the Poisson model, our posterior distribution for has the following form p( y 1,...,y m ) / p(y 1,...,y n )p( ) / P y i e n p( ) This means that whatever our conjugate class of densities is, it will have to include terms like c 1 e c 2 for numbers c 1 and c 2

40 Conjugate gamma prior The simplest class of probability distributions matching this description is the gamma family p( )= ba (a) a 1 e b for,a,b>0 Some properties include E{ } = a b Var[ ] = a b 2 (a 1)/b if a>1 mode( ) = 0 if a apple 1 We shall denote this family as G(a, b)

41 Posterior distribution Suppose Y 1,...Y n iid Pois( ) and G(a, b) Then work out on BOARD before pressing return p( y 1,...,y n ) / a+p y i 1 e (b+n) This is evidently a gamma distribution, and we have confirmed the conjugacy of the gamma family for the Poisson sampling model G(a, b) Y 1,...,Y n Pois( ) ) nx! { Y 1,...,Y n } G a + Y i,b+ n i=1

42 Combining information Estimation and prediction proceed in a manner similar to that in the binomial model The posterior expectation of is a convex combination of the prior expectation and the sample average work out on board BEFORE pressing return E{ y 1,...,y n } = b b + n a b + n b + n P yi n b is the number of prior observations a is the sum of the counts from b prior observations For large n, the information in the data dominates n b ) E{ y 1,...y n } ȳ, Var[ y 1,...,y n ] ȳ n

43 Posterior predictive Predictions about additional data can be obtained with the posterior predictive distribution work out on board BEFORE pressing return for ỹ 2 {0, 1, 2,...,}

44 Posterior predictive This is a negative binomial distribution with parameters (a + P y i,b+ n) for which E{Ỹ y 1,...,y n } = a + P y i b + n = E{ y 1,...,y n } Var[Ỹ y 1,...,y n ]= a + P y i b + n b + n +1 b + n = Var[ y 1,...,y n ] (b + n + 1) = E{ y 1,...,y n } b + n +1 b + n

45 Example: birth rates and education Over the course of the 1990s the General Societal Survey gathered data on the educational attainment and number of children of 155 women who were 40 years of age at the time of their participation in the survey These women were in their 20s in the 1970s, a period of historically low fertility rates in the United States In this example we will compare the women with college degrees to those without in terms of their numbers of children

46 Example: sampling model(s) Let Y 1,1,...,Y n1,1 denote the numbers of children for the n 1 women without college degrees and n 2 Y 1,2,...,Y n2,2 degrees be the data for the women with For this example, we will use the following sampling models we shall examine the appropriateness of this later Y 1,1,...,Y n1,1 Y 1,2,...,Y n2,2 iid Pois( 1 ) iid Pois( 2 ) The (sufficient) data are no bachelors: bachelors: n 1 = 111, P n 1 i=1 y i,1 = 217, ȳ 1 =1.95 n 2 = 44, P n 2 i=1 y i,2 = 66, ȳ 2 =1.50

47 Example: prior(s) and posterior(s) In the case were we have the following posterior distributions: 1 {n 1 = 111, P Y i,1 = 217} G( , ) 2 {n 1 = 44, P Y i,2 = 66} G(2 + 66, ) Posterior means, modes and 95% quantile-based CIs for 1 and 2 can be obtained from their gamma posterior distributions go to R code BEFORE next slide

48 Example: prior(s) and posterior(s) density prior post none post bach theta The posterior(s) provide substantial evidence that see R -- via Monte Carlo (later) 1 > 2. E.g., P ( 1 > 2 P Y i,1 = 217, P Y i,2 = 66) = 0.97

49 Example: posterior predictive(s) Consider a randomly sampled individual from each population. To what extent do we expect the uneducated one to have more children than the other? probability none bach Notice that there is much more overlap than for theta P (Ỹ1 > Ỹ2 P Y i,1 = 217, P Y i,2 = 66) = 0.48 P (Ỹ1 = Ỹ2 P Y i,1 = 217, P Y i,2 = 66) = children The distinction between { 1 > 2 } and {Ỹ1 > Ỹ2} is important: strong evidence of a difference between two populations does not mean the distance is large

50 Non-informative priors Priors can be difficult to construct There has long been a desire for prior distributions that can be guaranteed to play a minimal role in the posterior distribution Such distributions are sometimes called reference priors, and described as vague, flat, diffuse, or non-informative The rationale for using non-informative priors is often said to be to let the data speak for themselves, so that inferences are unaffected by the information external to the current data

51 Jeffreys invariance principle One approach that is sometimes used to define a noninformative prior distribution was introduced by Jeffreys (1946), based on considering one-to-one transformations of the parameter Jeffreys general principle is that any rule for determining the prior density should yield an equivalent posterior if applied to the transformed parameter Naturally, one must take the sampling model into account in order to study this invariance

52 The Jeffreys prior Jeffreys principle leads to defining the non-informative prior as p( ) / [i( )] 1/2, where i( ) is the Fisher information for Recall that i( ) =E ( d d log p(y ) 2 ) = E d 2 log p(y ) d 2 A prior so-constructed is called the Jeffreys prior for the sampling model p(y ) what a funny mix of Bayes and classical stats

53 Example: Jeffreys prior for the binomial sampling model Consider the binomial distribution Y Bin(n, ) which has log-likelihood log p(y ) =c + y log +(n y) log(1 ) Therefore Work out on board i( ) = E d 2 d 2 log p(y ) = n (1 ) So the Jeffreys prior density is then p( ) / 1/2 (1 ) 1/2 which is the same as Beta( 1 2, 1 2 )

54 Example: non-informative? This Beta( 1 2, 1 2 ) Jeffreys prior is non-informative in some sense, but not in all senses The posterior expectation would give a + b =1 sample(s) worth of weight to the prior mean This Jeffreys prior less informative than the uniform prior which gives a weight of 2 samples to the prior mean (also 1) In this sense, the least informative prior is Beta(0, 0) But is this a valid prior distribution?

55 Propriety of priors We call a prior density p( ) proper if it does not depend on the data and is integrable, i.e., Z the DATA used in the likelihood p( ) d < 1 Proper priors lead to valid joint Rprobability models p(,y) and proper posteriors ( p( y) d < 1) Improper priors do not provide valid joint probability models, but they can lead to proper posterior by proceeding with the Bayesian algebra p( y) / p(y )p( )

56 Example: continued The Beta(0, 0) Z 1 prior is improper since 0 1 (1 ) 1 d = 1 However, the Beta(0 + y, 0+n y) posterior that results is proper as long as y 6= 0,n A Beta(, ) prior, for 0 < 1 is always proper, and (in the limit) non-informative in some sense In practice, the difference between these alternatives (Beta(0, 0), Beta( 1 2, 1 2 ), Beta(1, 1)) is small; all three allow the data (likelihood) to dominate in the posterior

57 Example: Jeffreys prior for the Poisson sampling model Consider the Poisson distribution has log-likelihood Y Pois( ) which log p(y ) =c + y log work out on board BEFORE pressing return Therefore i( ) = E d 2 log p(y ) d 2 = 1 So the Jeffreys prior density is then p( ) / 1/2

58 Example: Improper Jeffreys prior R 1 0 1/2 d = 1 Since this prior is improper However, we can interpret is as a G( 1, i.e., within 2, 0) the conjugate gamma family, to see that the posterior is G n X i=1 y i, 0+n! This is proper as long as we have observed one data point, i.e., n 1, since

59 Example: non-informative? Again, this G( 1 2, 0) Jeffreys prior is non-informative in some sense, but not in all senses The (improper) prior p( ) / 1, or G(0, 0), is in some sense equivalent since it gives the same weight (b = 0) to the prior mean (although different) It can be shown that, in the limit as the prior parameters (a, b)! (0, 0) the posterior expectation approaches the MLE p( ) / 1 is a typical default for this reason As before, the practical difference between these choices is small

60 Notes of caution The search for non-informative priors has several problems, including it can be misguided: if the likelihood (data) is truly dominant, the choice among relatively flat priors cannot matter for many problems there is no clear choice difficulties arise when averaging over competing models (Lindley s paradox) Bottom line: If so few data are available that the choice of non-informative prior matters, then one should put relevant information into the prior

61 Example: Heart transplant mortality rate For a particular hospital, we observe n : the number of transplant surgeries y : the number of deaths within 30 days of surgery In addition, one can predict the probability of death for each patient based upon a model that uses information such as the patients medical condition before the surgery, gender, and race. From this, one can obtain e : the expected number of deaths (exposure) A standard model assumes that the number of deaths follows a Poisson distribution with mean e, and the objective is to estimate the mortality rate per unit exposure

62 Example: problems with the MLE The standard estimate of is the MLE ˆ = y/e. Unfortunately, this estimate can be poor when the number of deaths y is close to zero In this situation, when small death counts are possible, it is desirable to use a Bayesian estimate that uses prior knowledge about the size of the mortality rate

63 Example: prior information A convenient source of prior information is heart transplant data from a small group of hospitals that we believe has the same rate of mortality as the rate from the hospital of interest Suppose we observe z j : the number of deaths o j : the exposure o where Z j iid Pois(oj ) for ten hospitals (j =1,...,10) If we assign the non-informative prior p( ) / 1 then the posterior is G( P 10 j=1 z i, P n j=1 o i) (check this)

64 Example: the prior from data Suppose that, in this example, we have the prior data 0 10 X z i = 16, nx o i = 15174A j=1 j=1 So for our prior (to analyze the data for our hospital) we shall use G(16, 15174) Now, we consider inference about the heart transplant death rate for two hospitals A. small exposure (e a = 66) and y a =1death B. larger exposure (e b = 1767) and y b =4deaths

65 Example: posterior The posterior distributions are as follows A. B. a y a,e a G(16 + 1, ) b y b,e b G(16 + 4, ) density prior post A post B lambda P ( a < b y a,e a,y b,e b )=0.57

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Bayesian Inference. p(y)

Bayesian Inference. p(y) Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the

More information

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior: Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

Bayesian Estimation An Informal Introduction

Bayesian Estimation An Informal Introduction Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Using Probability to do Statistics.

Using Probability to do Statistics. Al Nosedal. University of Toronto. November 5, 2015 Milk and honey and hemoglobin Animal experiments suggested that honey in a diet might raise hemoglobin level. A researcher designed a study involving

More information

2 Belief, probability and exchangeability

2 Belief, probability and exchangeability 2 Belief, probability and exchangeability We first discuss what properties a reasonable belief function should have, and show that probabilities have these properties. Then, we review the basic machinery

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

One Parameter Models

One Parameter Models One Parameter Models p. 1/2 One Parameter Models September 22, 2010 Reading: Hoff Chapter 3 One Parameter Models p. 2/2 Highest Posterior Density Regions Find Θ 1 α = {θ : p(θ Y ) h α } such that P (θ

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

More information

Likelihood and Bayesian Inference for Proportions

Likelihood and Bayesian Inference for Proportions Likelihood and Bayesian Inference for Proportions September 18, 2007 Readings Chapter 5 HH Likelihood and Bayesian Inferencefor Proportions p. 1/24 Giardia In a New Zealand research program on human health

More information

Other Noninformative Priors

Other Noninformative Priors Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Likelihood and Bayesian Inference for Proportions

Likelihood and Bayesian Inference for Proportions Likelihood and Bayesian Inference for Proportions September 9, 2009 Readings Hoff Chapter 3 Likelihood and Bayesian Inferencefor Proportions p.1/21 Giardia In a New Zealand research program on human health

More information

Basic methods for simulation of random variables: 1. Inversion

Basic methods for simulation of random variables: 1. Inversion INSTITUT FOR MATEMATISKE FAG c AALBORG UNIVERSITET FREDRIK BAJERS VEJ 7 G 9220 AALBORG ØST Tlf.: 96 35 88 63 URL: www.math.auc.dk Fax: 98 15 81 29 E-mail: jm@math.aau.dk Basic methods for simulation of

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Part 3: Parametric Models

Part 3: Parametric Models Part 3: Parametric Models Matthew Sperrin and Juhyun Park August 19, 2008 1 Introduction There are three main objectives to this section: 1. To introduce the concepts of probability and random variables.

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Predictive Distributions

Predictive Distributions Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Prior Choice, Summarizing the Posterior

Prior Choice, Summarizing the Posterior Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes

More information

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM Eco517 Fall 2004 C. Sims MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM 1. SOMETHING WE SHOULD ALREADY HAVE MENTIONED A t n (µ, Σ) distribution converges, as n, to a N(µ, Σ). Consider the univariate

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Lecture 6. Prior distributions

Lecture 6. Prior distributions Summary Lecture 6. Prior distributions 1. Introduction 2. Bivariate conjugate: normal 3. Non-informative / reference priors Jeffreys priors Location parameters Proportions Counts and rates Scale parameters

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

INTRODUCTION TO BAYESIAN ANALYSIS

INTRODUCTION TO BAYESIAN ANALYSIS INTRODUCTION TO BAYESIAN ANALYSIS Arto Luoma University of Tampere, Finland Autumn 2014 Introduction to Bayesian analysis, autumn 2013 University of Tampere 1 / 130 Who was Thomas Bayes? Thomas Bayes (1701-1761)

More information

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016 Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

A Primer on Statistical Inference using Maximum Likelihood

A Primer on Statistical Inference using Maximum Likelihood A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Beta statistics. Keywords. Bayes theorem. Bayes rule

Beta statistics. Keywords. Bayes theorem. Bayes rule Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

PS 203 Spring 2002 Homework One - Answer Key

PS 203 Spring 2002 Homework One - Answer Key PS 203 Spring 2002 Homework One - Answer Key 1. If you have a home or office computer, download and install WinBUGS. If you don t have your own computer, try running WinBUGS in the Department lab. 2. The

More information

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Neutral Bayesian reference models for incidence rates of (rare) clinical events Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice

More information

HPD Intervals / Regions

HPD Intervals / Regions HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)

More information

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bayesian vs frequentist techniques for the analysis of binary outcome data 1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information