Part 4: Multi-parameter and normal models
|
|
- Joshua Ford
- 5 years ago
- Views:
Transcription
1 Part 4: Multi-parameter and normal models 1
2 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g., the CLT it is a simple model with separate parameters for the population mean and variance - two quantities that are often of primary interest
3 The normal PDF A RV Y is said to be normally distributed with mean µ and variance if the density of Y is given by f(y µ, )= 1 (y µ) p exp for 1 <y<1 We shall write Y N (µ, parameters (µ, ) ) and call the pair of 3
4 Normal properties Some important things to remember about the normal distribution include the distribution is symmetric about µ : the mode, the median and the mean are all equal to µ about 95% of the population lies within two (more precisely 1.96) standard deviations of the mean if X N (µ x, x) and Y N (µ y, y) and X and Y are independent, then ax + by N (aµ x + bµ y,a x + b y) 4
5 Joint sampling density Suppose our sampling model is {Y 1,...,Y n µ, } iid N (µ, ) Then the joint sampling density is given by ny p(y 1,...,y n µ, )= p(y i µ, ) i=1 = ny i=1 1 (yi p exp µ) =( p ) n/ exp 5 ( 1 nx i=1 (y i µ) )
6 Joint sampling density By expanding the quadratic term in the exponent nx (y i µ) nx nx i=1 we see that = 1 i=1 p(y 1,...,y n µ, y i µ ) i=1 depends upon y i + n µ y 1,...,y n through { P yi, P y i }, a two-dimensional sufficient statistic Knowing these quantities is equivalent to knowing nx nx ȳ = 1 n y i, and s = 1 (y i ȳ) i=1 n 1 i=1 and so are also a sufficient statistic {ȳ, s } 6
7 Inference by conditioning Inference for this two-parameter model can be broken down into two one-parameter problems We begin with the problem of making inference for when is known, and use a conjugate prior for µ µ For any (conditional) prior distribution p(µ ) the posterior distribution will satisfy p(µ y 1,...,y n, ) / exp ( 1 ) nx (y i µ) p(µ i=1 ) / e c 1(µ c ) p(µ 7 )
8 Conjugate prior So we see that if p(µ ) is to be conjugate, then it must include quadratic terms like exp{c 1 (µ c ) } The simplest such class of probability densities is the normal family, suggesting that if p(µ ) is normal and {y 1,...,y n } iid N (µ, ) then p(µ y 1,...,y n, ) is also a normal density 8
9 Posterior derivation If µ N (µ 0, 0 ) then where p(µ y 1,...,y n, ) / exp (µ b/a) /a a = n and b = µ P yi This function has exactly the same form as a normal density curve with 1/a playing the role of the variance and b/a playing the role of the mean 9
10 Normal posterior Since probability distributions are determined by their shape, this means that p(µ y 1,...,y n, ) is indeed a normal density i.e., {µ y 1,...,y n, } N (µ n, n) where n = 1 a = n and µ n = b a = 1 0 µ 0 + n ȳ n So: normal prior + normal sampling model = normal posterior 10
11 Combining information The (conditional) posterior parameters n and µ n combine the prior parameters 0 and µ 0 with the terms from the data First consider the posterior inverse variance, a.k.a. the precision: 1 n = n n = 1 So the posterior precision combines the sampling 0 precision and the prior precision n = 0 + n and n!1, as n!1 11
12 Combining means Notice that the posterior mean decomposes as µ n = n µ 0 + n 0 + n ȳ so it is a weighted average of the prior mean and the data mean The weight on the prior mean is precision 1/ 0, the prior 1
13 Prior sample size If the prior mean were based on apple 0 prior observations from the same (or similar) population as Y 1,...,Y n then we might want to set 0 = apple 0 which may be interpreted as the variance of the mean of the prior observations In this case the formula for the posterior mean reduces to µ n = apple 0 apple 0 + n µ 0 + n apple 0 + nȳ µ n! ȳ, as n!1 Finally, 13
14 Prediction Consider predicting a new observation population after having observed Ỹ from the {Y 1 = y 1,...,Y n = y n } To find the predictive distribution we may take advantage of the following fact {Ỹ µ, } N (µ, ), Ỹ = µ + ", " N (0, ) In other words, saying that Ỹ is normal with mean µ is the same as saying that Ỹ is equal to µ plus some mean-zero normally distributed noise 14
15 Predictive moments Using this result, let s first compute the posterior mean and variance of Ỹ E{Ỹ y 1,...,y n, } = E{µ + " y 1,...,y n, } = E{µ y 1,...,y n, = µ n +0=µ n } + E{ " y 1,...,y n, } Var[Ỹ y 1,...,y n, ] = Var[µ + " y 1,...,y n, ] = Var[µ y 1,...,y n, ] + Var[ " y 1,...,y n, ] = n + 15
16 Moments to distribution Recall that the sum of independent normal random variables is normal Therefore, since µ and ", conditional on y 1,...,y n and, are normally distributed, so is Ỹ = µ + " So the predictive distribution is Ỹ y 1,...,y n, N (µ n, n + ) Observe that, as n!1, Var[Ỹ y 1,...,y n, ]! > 0 i.e., certainty in µ does not translate into certainty about Ỹ 16
17 Example: Midge wing length Grogan and Wirth (1981) provide data on the wing length in millimeters of nine members of a species of midge (small, two-winged flies) From these measurements we wish to make inference about the population mean µ 17
18 Example: prior information Studies from other populations suggest that wing lengths are usually around 1.9mm, so we set µ 0 =1.9 We also know that lengths must be positive (µ >0) We can approximate this restriction with a normal prior distribution for µ as follows: Since most of the normal density is within two standard deviations of the mean we choose 0 so that µ 0 0 > 0 ) 0 < 1.9/ =0.95 For now we take 0 =0.95 which somewhat overstates our prior uncertainty 18
19 Example: data and posterior The observations in order of increasing magnitude are (1.64, 1.70, 1.7, 1.74, 1.8, 1.8, 1.8, 1.9,.08) Therefore the posterior is {µ y 1,...,y 9, } N (µ n, n) where µ n = 1 0 µ 0 + n ȳ n = n = n =
20 Example: a choice of If we take = s =0.017, then {µ y 1,...,y 9, =0.017} N (1.805, 0.00) density posterior prior mu A 95% CI for µ is (1.7, 1.89) 0
21 Example: full accounting of uncertainty However, these results assume that we are certain that = s when in fact s is only a rough estimate of based upon only nine observations To get a more accurate representation of our information (and in particular of our posterior uncertainty) we need to account for the fact that not known is 1
22 Joint Bayesian inference Bayesian inference for two or more parameters is not conceptually different from the one-parameter case For any joint prior distribution p(µ, ) for µ and, posterior inference proceeds using Bayes rule: p(µ, y 1,...,y n )= p(y 1,...,y n µ, )p(µ, ) p(y 1,...,y n ) As before, we will begin by developing a simple conjugate class of prior distributions which will make posterior calculations easy
23 Prior decomposition A joint distribution can always be expressed as the product of a conditional probability and a marginal probability: p(µ, )=p(µ )p( ) We just saw that if were known, then a conjugate prior distribution for µ was N (µ 0, 0 ) 3
24 Prior decomposition Lets consider the particular case where 0 = /apple 0 : p(µ, )=p(µ )p( ) = N (µ; µ 0, 0 = /apple 0 ) p( ) The parameters µ 0 and apple 0 can be interpreted as the mean and sample size from a set of prior observations For we need a family of prior distributions that has support on (0, 1) 4
25 Conjugate prior for One such family of distributions is the gamma family, as we used for the Poisson sampling model Unfortunately, this family is not conjugate for the normal variance However, the gamma family does turn out to be a conjugate class of densities for the precision 1/ When using such a prior we say that has an inversegamma distribution 1/ G(a, b) ) IG(a, b) 5
26 Inverse-gamma prior For interpretability, instead of using a and b we will use IG 0, 0 Under this parameterization: 0 E{ } = 0 mode( )= 0 0 / 0 / 1 0 / 0 /+1, so mode( ) < 0 < E{ } Var[ ] is decreasing in v 0 We can interpret the prior parameters ( 0, 0 ) as the prior sample variance and the prior sample size 6
27 Posterior inference Suppose our prior model and sampling model are IG( 0 /, 0 0 /) µ N (µ 0, /apple 0 ) Y 1,...Y n iid N (µ, ) Just as the prior distribution for µ and can be decomposed as p(µ, )=p(µ )p( ), the posterior distribution can be similarly decomposed as p(µ, y 1,...,y n )=p(µ,y 1,...,y n )p( y 1,...,y n ) 7
28 Posterior conditional(s) We have already derived the posterior conditional distribution of µ given /apple 0 0 Plugging in for we have {µ y 1,...,y n, } N (µ n, /apple n ), where apple n = apple 0 + n and µ n = (apple 0/ )µ 0 +(n/ )ȳ apple 0 / + n/ (posterior sample size and mean) = apple 0µ 0 + nȳ apple n 8
29 Posterior conditional(s) The posterior distribution of can be obtained by integrating over the unknown value of µ p( y 1,...,y n ) / p(y 1,...,y n Z )p( ) = p( ) p(y 1,...,y n µ, )p(µ ) dµ This integral can be done without much knowledge of calculus, but it is somewhat tedious 9
30 Posterior conditional(s) Check that this leads to { y 1,...,y n } IG( n /, n n /), where n = 0 + n n = 1 apple 0 0 +(n 1)s + apple 0n (ȳ µ 0 ) n apple n These formulae suggest that the interpretation of 0 as a prior sample size, from which a prior sample variance 0 has been obtained, is reasonable Similarly, we may think of 0 0 and n n as a prior and posterior sum of squares, which is the sum of the prior and data sum of squares 30
31 Posterior conditional(s) n = 1 n apple 0 0 +(n 1)s + apple 0n apple n (ȳ µ 0 ) However, the third term is a bit harder to understand It says that a large value of (ȳ µ 0 ) posterior probability of a large increases the This makes sense for our joint prior p(µ, : If we want to think of prior observations with variance is also an estimate of µ 0 as the sample mean of ), then this term 31
32 Example: the midge data The studies of other populations suggest that the true mean and standard deviation of our population under study should not be far from 1.9mm and 0.1mm respectively, suggesting µ 0 =1.9 and 0 =0.01 However, this population may be different from others in terms of wing length, and so we choose apple 0 = 0 =1 so that our prior distributions are only weakly centered around the estimates from other populations 3
33 Example: the midge data The sample mean and variance of our observed data are ȳ =1.804 and s = From these values and the prior parameters, we compute µ n = apple 0µ 0 + nȳ = apple n n = 1 apple 0 µ 0 +(n n = )s + apple 0n apple n (ȳ µ 0 ) = =0.015
34 Example: the full posterior Our joint posterior distribution is completely determined by these values µ n =1.814, apple n = 10, n =0.015, n = 10 and can be expressed as {µ y 1,...,y n, } N (1.814, /10) { y 1,...,y n } IG(10/, /) 34
35 Example: the full posterior image Observe that the contours are more peaked as a µ function of for low values of than high values 35
36 Marginal interests For many data analyses, interest primarily lies in estimating the population mean µ, and so we would like to calculate quantities like E{µ y 1,...,y n } Var[µ y 1,...,y n ] P (µ 1 <µ y 1,...,y n ) These quantities are all determined by the marginal posterior distribution of µ given the data But all we know so far is that the conditional distribution of given the data and is normal, and that µ given the data is inverse-gamma 36
37 Joint sampling If we could generate marginal samples of µ, then we could use the MC method to approximate the marginal quantities of interest It turns out that this is easy to do by generating samples of µ and from their joint posterior by MC (1) IG( n /, n n /), µ (1) N (µ n, (1) /apple n ) () IG( n /, n n /), µ () N (µ n, () /apple n ) (S) IG( n /, n n /), µ (S) N (µ n, (S) /apple n ) 37
38 Monte Carlo marginals The sequence of pairs {( (1),µ (1) ),...,( (S),µ (S) )} simulated with this procedure are independent samples from the joint posterior p(µ, y 1,...,y n ) Additionally, the simulated sequence {µ (1),...,µ (S) } can be seen as independent samples from the marginal posterior distribution p(µ y 1,...,y n ) So these may be used to make MC approximations to posterior expectations of (functions of) µ 38
39 Example: joint & marginal samples µ (s) The samples are obtained conditional on the samples of leading to joint samples (s) We may extract the marginals from the joint 39
40 Example: credible v. confidence intervals density cred conf mu The Bayesian CI is very close to the frequentist CI, based on t-statistics since p(µ y 1,...,y n ) St 0 +n(µ n, n/apple n ) apple 0 0 If and are small, this will be very close to the sampling distribution of the MLE 40
41 Objective priors? How small can apple 0 and 0, the prior sample sizes, be? Consider µ n = apple 0µ 0 + nȳ apple n n = 1 apple 0 µ 0 +(n n 1)s + apple 0n apple n (ȳ µ 0 ) So as apple 0, 0! 0 µ n! ȳ n! n n 1 s = 1 n X (yi ȳ) 41
42 Improper prior This leads to the following posterior : { n y 1,...,y n } IG, n 1 X (yi n {µ y 1,...,y n, } N ȳ, n ȳ) More formally, if we take the improper prior p(µ, ) / 1/ the posterior is proper, and we get the same posterior conditional for µ as above, but { y 1,...,y n } IG n 1, 1 X (yi ȳ) 4
43 Classical comparison The marginal posterior for µ may then be obtained as p(µ y 1,...,y n ) Z (cond. prob) (LTP) = p(µ,y 1,...,y n )p( y 1,...,y n ) d (normal conditional) (IG conditional) / St n 1 (µ;ȳ, s /n) It is interesting to compare this result to the sampling distribution of the t-statistic, i.e., of the MLE t = Ȳ µ s/ p n µ t n 1, ˆµ = Ȳ St n 1(µ, s /n) 43
44 Point estimator A point estimator of an unknown parameter is a function that converts your data into a single element of the parameter space In the case of the normal sampling model and conjugate prior distribution, the posterior mean estimator of µ is ˆµ b (y 1,...,y n )=E{µ y 1,...,y n } = n apple 0 + nȳ + apple 0 apple 0 + n µ 0 = wȳ +(1 w)µ 0 44
45 Sampling properties The sampling properties of an estimator such as ˆµ b refers to its behavior under hypothetically repeatable surveys or experiments Lets compare the sampling properties of ˆµ b to ˆµ e (y 1,...,y n )=ȳ, the sample mean, when the true population mean is µ true E{ˆµ e µ = µ true } = µ true E{ˆµ b µ = µ true } = wµ true +(1 w)µ 0 So we say that ˆµ e is unbiased and, unless µ 0 = µ true, we say that ˆµ b is biased 45
46 Bias Bias refers to how close the center of mass of the sampling distribution of the estimator is to the true value An unbiased estimator is an estimator with zero bias, which sounds desirable However, bias does not tell us how far away an estimate might be from the true value For example, ˆµ = y 1 is an unbiased estimator of the population mean µ true, but will generally be farther away from µ true than ȳ 46
47 Mean squared error To evaluate how close an estimator ˆ is likely to be to the true value error (MSE) true, we might use the mean squared Letting m = E{ˆ true }, the MSE is MSE[ˆ true ]=E{( true ) = true } = E{(ˆ m) = true } +(m true ) = Var[ˆ = true ] + Bias [ˆ = true ] 47
48 Mean squared error This means that, before the data are gathered, the expected distance from the estimator to the true value depends on how close distribution of trueˆ is to the center of the (the bias), and how spread out the distribution is (the variance) 48
49 Comparing estimators Getting back to our comparison of ˆµ b to ˆµ e, the bias of ˆµ e is zero, but Var[ˆµ e µ = µ true, ]= n, whereas Var[ˆµ b µ = µ true, ]=w n < n and so ˆµ b has lower variability 49
50 Comparing estimators Which one is better in terms of MSE? MSE[ˆµ e µ = µ true ] = Var[ˆµ e µ = µ true ]= MSE[ˆµ b µ = µ true ] = Var[ˆµ b µ = µ true ] + Bias [ˆµ b µ = µ true ] = w n +(1 w) (µ 0 µ truth ) Therefore MSE[ˆµ b µ = µ true ] < MSE[ˆµ e µ = µ true ] n if (µ 0 µ true ) < n 1+w 1 w = 1 n + apple 0 50
51 Low MSE Bayesian estimators So if you know just a little about the population you are about to sample from, you should be able to find values of µ 0 and apple 0 such that the Bayesian estimator has lower average distance to the truth (MSE) than the MLE E.g., if you are pretty sure that your prior guess µ 0 is within two standard deviations of the true population mean, then if you pick apple 0 =1 you can be pretty sure that the Bayes estimator has a lower MSE since (µ 0 µ true ) < 4 51
52 Example: IQ scores Scoring on IQ tests is designed to produce a normal distribution with a mean of 100 and a variance of 5 when applied to the general population Suppose that we were to sample n individuals from a particular town in the USA and then use it to estimate µ, the town-specific mean IQ score For a Bayesian estimation, if we lack much information about the town in question, a natural choice of would be µ 0 = 100 5
53 Example: IQ scores, MSE Suppose that, unknown to us, the people in this town are extremely exceptional and the true mean and variance of IQ scores in the town are (µ true = 11, = 169) The MSEs of the estimators ˆµ e and ˆµ b are MSE[ˆµ e µ = 11, MSE[ˆµ b µ = 11, = 169] = Var[ˆµ e µ = 11, = 169 n = 169] = Var[ˆµ b µ = 11, = 169] = 169] + Bias [ˆµ b µ = 11, = 169] = w n +(1 w) 144
54 Example: Relative MSE One way compare MSEs is through their ratio. Consider MSE[ˆµ b µ = 11]/MSE[ˆµ e µ = 11] plotted as a function of n, for k 0 {1,, 3} when k 0 {1, }, relative MSE k0 = 0 k0 = 1 k0 = k0 = 3 the Bayes estimate has lower MSE than the MLE the µ 0 = 100 setting is only bad for apple 0 3 as n increases, the bias of the estimators shrinks to zero sample size 54
55 Example: Sampling densities Consider the sampling densities when n = 10 which highlight the relative contributions of the bias and variance to the MSE density k0 = 0 k0 = 1 k0 = k0 = 3 truth mean IQ 55
Part 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More informationPredictive Distributions
Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y
More informationPart 7: Hierarchical Modeling
Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,
More informationBernoulli and Poisson models
Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationTerminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationBias. Definition. One of the foremost expectations of an estimator is that it gives accurate estimates.
Bias One of the foremost expectations of an estimator is that it gives accurate estimates. Definition An estimator ˆ is said to be accurate or unbiased if and only if µˆ = E(ˆ ) = 0 (19) In plain language,
More informationBayesian performance
Bayesian performance Frequentist properties of estimators refer to the performance of an estimator (say the posterior mean) over repeated experiments under the same conditions. The posterior distribution
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationPhysics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester
Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationBayesian Inference and Decision Theory
Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 4 ENGR (703) 993-644 Office Hours: Thursday 4:00-6:00 PM, or by appointment Spring 08 Unit 5: The Normal Model Unit 5 -
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationA noninformative Bayesian approach to domain estimation
A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationEstimation, Inference, and Hypothesis Testing
Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationModule 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline
Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationIntroduction to Applied Bayesian Modeling. ICPSR Day 4
Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationLecture 3. Univariate Bayesian inference: conjugate analysis
Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationEvaluating the Performance of Estimators (Section 7.3)
Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMultivariate Normal & Wishart
Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.
More informationThe binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.
The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationGeneralized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict
Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict Gero Walter Lund University, 15.12.2015 This document is a step-by-step guide on how sets of priors can
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationConsider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.
ATMO/OPTI 656b Spring 009 Bayesian Retrievals Note: This follows the discussion in Chapter of Rogers (000) As we have seen, the problem with the nadir viewing emission measurements is they do not contain
More informationUnobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:
Pi Priors Unobservable Parameter population proportion, p prior: π ( p) Conjugate prior π ( p) ~ Beta( a, b) same PDF family exponential family only Posterior π ( p y) ~ Beta( a + y, b + n y) Observed
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationBasic Concepts of Inference
Basic Concepts of Inference Corresponds to Chapter 6 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT) with some slides by Jacqueline Telford (Johns Hopkins University) and Roy Welsch (MIT).
More informationSpecial Topic: Bayesian Finite Population Survey Sampling
Special Topic: Bayesian Finite Population Survey Sampling Sudipto Banerjee Division of Biostatistics School of Public Health University of Minnesota April 2, 2008 1 Special Topic Overview Scientific survey
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationRemarks on Improper Ignorance Priors
As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationBayesian Inference. Chapter 2: Conjugate models
Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in
More informationExample continued. Math 425 Intro to Probability Lecture 37. Example continued. Example
continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationLecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1
Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationInferring from data. Theory of estimators
Inferring from data Theory of estimators 1 Estimators Estimator is any function of the data e(x) used to provide an estimate ( a measurement ) of an unknown parameter. Because estimators are functions
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More information