Bayesian Inference: Concept and Practice

Size: px
Start display at page:

Download "Bayesian Inference: Concept and Practice"

Transcription

1 Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017

2 1 2 3

3 Bayes theorem In order to estimate the parameters of a model, it is natural to ask, given the data: what probability distribution can we assign to the parameters? p(θ X) =? From Bayes theorem we have: p(θ X)p(X) = p(x θ)p(θ) p(θ X) = p(x θ)p(θ) p(x) = p(x θ)p(θ) p(x θ)p(θ)dθ We refer to p(θ X) as the posterior and p(θ) as the prior distribution of θ. is therefore updating our prior believes given some new data.

4 Prior believes Where do these prior believes come from? Purely subjective starting point. Earlier scientific findings. Earlier data in the same endeavour. Based on skeptical or optimist views about what should be found. But also... Priors can be vague, low on information, so as not to drive results. Priors can be improper, so that they are not real probability densities, but really low on information. Priors can be data-driven.

5 Likelihood function p(θ X) = p(x θ)p(θ) p(x) While p(x θ) is a probability distribution of X over possible values of θ, it can also be seen as a function of probabilities of θ over possible values of X. In this case it is not a proper probability density function, as it does not necessarily add up to 1, and it is called a likelihood function. L(θ X) = p(x θ). It is often easier to work with the log-likelihood l(θ X) = log L(θ X). (dice example) (Lee, 2012, 37)

6 Frequentist vs. Bayes Note that in, we take θ to be a random variable, with an associated probability distribution, and we try to describe this distribution. In frequentist, we take instead X as a random variable, with an associated probability distribution parameterised by θ, and find the θ that maximises this distribution. Since p X (X θ) p θ (X θ) = L(θ X), this amounts to finding the θ that maximises the (log-)likelihood function. We return to the comparison with frequentist statistics when we turn to.

7 Predictive distribution p(θ X) = p(x θ)p(θ) p(x) The normalising constant p(x) can be taken as the marginal distribution p(x) = p(x θ)p(θ)dθ, which is also called the predictive distribution, as it is our prediction of X taking into account our uncertainty around θ as well as around X given θ. In practice, there are many circumstances where we do not need to concern ourselves with this constant, focusing instead on p(θ X) p(x θ)p(θ). (Lee, 2012, 39)

8 posterior likelihood prior

9 Parametric Note that is parametric all on parameters is given a particular type of probability distribution. If we have no idea about the type of distribution that describes p(x θ), can be problematic. There is such a thing as semiparametric or non-parametric, which is beyond the scope of today.

10 Normal distribution (1 observation, unknown µ) In some circumstances, has an analytic solution. For example, assume we have one observation (x) from a normally distributed random variable with mean µ and variance σ 2, X N(µ, σ 2 ), and σ 2 is known. p(x) = ( 2πσ 2) ( ) 1 2 (x µ) 2 exp 2σ 2 Suppose we have a prior belief regarding the mean that follows a normal distribution: µ N(µ 0, σ 2 µ 0 ). p(µ) = ( 2πσ 2 µ 0 ) 1 2 exp ( (µ µ0 ) 2 ) 2σ 2 µ 0 (Lee, 2012, 40)

11 Normal distribution (1 observation, unknown µ) We can now update our prior believe as follows: p(µ x) p(x µ)p(µ) = ( 2πσ 2) ( 1 2 (x µ) 2 exp 2σ 2 ( ( ) 2πσ (µ µ0 ) 2 µ0 exp 2σ 2 µ 0 ) ), which can be worked out as: ( p(µ x) = (2πσµ 2 1 ) 1 2 exp (µ µ 1) 2 ) 2σµ 2, 1 which implies µ x N(µ 1, σµ 2 1 ), ( ) where µ 1 = σµ 2 µ0 1 + x and σ 2 σµ 2 0 σ 2 µ 1 = ( σµ σµ 2 ) 1. (Lee, 2012, 40 41)

12 Normal distribution (1 observation, unknown µ) Observation x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 2) p(µ) prior posterior µ

13 Normal distribution (1 observation, unknown µ) Observation x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 5) p(µ) prior posterior µ

14 Normal distribution (1 observation, unknown µ) statisticians typically work with the precision instead of the variance: τ = ( σ 2) 1, so that σ 2 µ1 = ( σµ σµ 2 ) 1 simplifies to τ 1 = τ 0 + τ, or posterior precision equals prior precision plus likelihood precision (in the case of a normal prior and normal likelihood). ( µ 1 = σµ 2 µ0 1 σµ 2 + x ) τ 0 0 σ 2 = µ 0 τ 0 + τ + x τ τ 0 + τ, which is therefore a weighted mean between prior mean and the data, weighted by their precision. Note that this still assumes σ 2, or τ, is known. (Lee, 2012, 41)

15 Normal distribution (n observations, unknown µ) When we do not have only one, but n observations from the same random variable, we obtain: τ 1 = τ 0 + nτ ( µ 1 = τ1 1 µ 0 τ 0 + x i τ i = τ1 1 (µ 0 τ 0 + n xτ) Note that we would obtain the same result if we updated the prior using the first observation, then take the posterior as the prior using the second observation, and so forth, which is called online updating. ) (Lee, 2012, 45)

16 Normal distribution (n observations, unknown µ) 5 observations with x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 10) p(µ) prior posterior µ

17 Normal distribution (n observations, unknown µ) 100 observations with x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 10) p(µ) prior posterior µ

18 Normal distribution (n observations, unknown µ and τ) Similar analytical solutions exist when both µ and τ are unknown, and given both normal priors, or other (improper) priors, such as uniform ones. The basic principles are more important: The posterior is a combination of prior information and information from the data. The more data, and the less variable this data, the more the posterior is driven by the data instead of the prior.

19 1 2 3

20 The linear model is familiar from ordinary least squares (OLS) estimation: y i = β 0 + β 1 x i1 + β 2 x i β k x ik + ε i or y = Xβ + ε, whereby ε N(0, σε). 2 In, the quantity of interest is then: p(β y, X) p(y, X β)p(β), whereby we assumed that the errors and X are independent, p(ε, X) = p(ε)p(x). (Lancaster, 2004, )

21 We assume a (hierarchical) prior for the errors, assuming that these are normally and independently distributed with mean zero and precision τ (τ = 1/σε): 2 p(ε τ) τ n 2 exp ( τ ) 2 ε ε, from which follows: p(y, X β, τ) τ n 2 exp ( τ ) 2 (y Xβ) (y Xβ) p(x β). If we assume X is strictly exogenous and does not depend on β or τ, we obtain: p(y, X β, τ) = p(y X, β) τ n 2 exp ( τ ) 2 (y Xβ) (y Xβ). (Lancaster, 2004, 117)

22 : improper prior There is a conventional, vague, improper uniform distribution for β, and τ, namely whereby < β <, τ > 0. p(β, τ) 1 τ, Note that this prior is improper as it is not an actual probability distribution it does not add to one. To obtain the posterior, we multiply the likelihood by the prior: ( p(β, τ y, X) τ n 2 1 exp τ ) 2 (y Xβ) (y Xβ). (Lancaster, 2004, 120)

23 : improper prior The marginal posterior distribution of β in this case is: β t(b, s 2 (X X) 1, ν), a t-distribution with ν = n k degrees of freedom, where b = (X X) 1 X y is the OLS estimate of β and s 2 (X X) 1 the OLS estimate of the variance-covariance matrix. s 2 = e e ν is the residual variance. The use of the improper prior made this tractable analytically and shows the importance of least squares also in. (Lancaster, 2004, 124, 133)

24 : improper prior The marginal posterior distribution of τ in this case is ) p(τ yx) τ ν 2 1 exp ( τνs2, 2 a gamma distribution. The expected value then works out to be E(τ) = 1/s 2, which is not surprising given that the precision is the reciprocal of the variance. (Lancaster, 2004, )

25 : conjugate prior A conjugate prior is a prior that is of the same family of probability distributions as the posterior, such that a posterior based on one set of observations can be used immediately as prior for a subsequent set of observations. The natural conjugate priors in this case are β τ N(β 0, τ 1 A 1 ) and τ G(α, c) with density ( p(β, τ) τ α 2 1 exp τ ) ( 2 (β β 0) A(β β 0 ) exp Compare this to the likelihood: p(y, X β, τ) τ n 2 exp ( τ ) 2 (β b) X X(β b) exp τc 2 ). ( ) τe e, 2 where b is the least squares estimate and e the associated residuals. (Lancaster 2004, 133; Gamerman and Lopes 2006, 55 56)

26 : conjugate prior Using this prior, we obtain an expected value of the slope parameters of E(β y, X) = (X X + A) 1 (X Xb + Aβ 0 ), which can be seen as a weighted average between the prior parameter β 0 and the least squares estimate b, weighted by their respective precisions. This prior is therefore much more informative, it affects the results more strongly than the flat improper prior. (Lancaster, 2004, )

27 Jeffreys priors One well-known proposal to obtain non-informative, improper priors is to use a prior where I (θ X) = E p(θ) I (θ X) 1 2, [ ] 2 log f (X θ) θ θ θ is the expectation of the Fisher information matrix of θ. In particular, this leads to improper priors that remain improper under reparameterisation of the model, which is not necessarily the case otherwise. Typically, this leads to priors of p(θ) k for location parameters and p(τ) τ 1 for scale parameters. (Gamerman and Lopes, 2006, 45 46)

28 Power priors Another special type of prior is the power prior, where the idea is to put a relative weight on prior data and newly obtained data, which itself can be configured. Say, we collect data X, but an earlier study had already collected data X 0 and provided posterior estimates. We can then use the prior p(θ X 0, a 0 ) p(θ) [L(θ X 0 )] a 0, where 0 a 0 1. So our posterior becomes p(θ X, X 0, a 0 ) p(x θ)p(θ X 0, a 0 ). In typical style, we can also put a prior on this parameter: p(θ X 0, a 0 ) = 1 0 p(θ) [L(θ X 0 )] a 0 p(a 0 ψ)da 0, with ψ a hyperparameter of some distribution of a 0, e.g. gamma. (Gill, 2015, )

29 1 2 3

30 Hypothesis Assume we have an unknown parameters θ, which we know to be from a set Θ. In frequentist statistics we usually have a null H 0 : θ Θ 0 and an alternative H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 =. (Lee, 2012, 138) For example, we might have a parameter β, with H 0 : β = 0 and H 1 : β 0.

31 Frequentist The probability of observing any given value of β is infinitely small, so in frequentist statistics we reject H 0 if, assuming the sampling distribution given H 0, the probability of observing x or greater is less than threshold value α. Note that the p-value is not the probability that the null or the alternative is true. As n increases, p decreases. The α-value of 0.05 is arbitrary. The null and alternative hypotheses are often unrealistic, and typically strongly biased in favour of one over the other. The p-value is based on the probability of observations we have not made. (Friel 2015, lecture 8; Lee 2012, 139)

32 Frequentist Harold Jeffreys remarks: What the use of p implies, therefore, is that a that may be true may be rejected because it has not predicted observable results that have not occurred. (Jeffreys 1967, as cited in Lee 2012, 139) Andrew Gelman remarks: The relevant goal is not to ask the question Do the data come from the assumed model? (to which the answer is almost always no), but to quantify the discrepancies between data and model, and assess whether they could have arisen by chance, under the model s own assumptions. (Friel, 2015, lecture 8)

33 : Bayes factor As usual, in we attempt to calculate full probability distributions more directly. So we are interested in p 0 = p(θ Θ 0 X) and p 1 = p(θ Θ 1 X), the posterior probabilities of the two hypotheses. And of course, we need priors. π 0 = p(θ Θ 0 ) and π 1 = p(θ Θ 1 ) We then focus on the odds of H 0 against H 1, the ratio p 0 p 1 consider the Bayes factor and B = p 0/p 1 π 0 /π 1 = p 0π 1 p 1 π 0, i.e. the odds in favour of H 0 against H 1 given the data. (Lee, 2012, )

34 : likelihood ratio If θ could only have two possible values, θ 0 and θ 1, then p 0 π 0 p(x θ 0 ) and p 1 π 1 p(x θ 1 ), and therefore The Bayes factor is then p 0 p 1 = π 0 π 1 p(x θ 0 ) p(x θ 1 ). B = p(x θ 0) p(x θ 1 ), which is the likelihood ratio of H 0 against H 1. When θ can have more values, we need to integrate over all possible values: 1 θ Θ B = 0 π 0 p(x θ)p(θ)dθ 1 θ Θ 1 π 1 p(x θ)p(θ)dθ. (Lee, 2012, 141)

35 : critical values Much like the α-value for judging p-values, using thresholds is generally inappropriate. Nevertheless, Jeffreys suggests the following: B > 1 Support for H 0 1 > B minimal evidence against H > B 10 1 substantial evidence against H > B 10 2 strong evidence against H 0 B < 10 2 decisive evidence against H 0 And one can translate against H 0 into in favour of H 1 in a way that you cannot in frequentist conclusions. Note that calculating B requires the normalising constant p(x), which is often difficult to obtain. B is also more sensitive to the priors than most other s. (Gill, 2015, 217)

36 p-values Assume a one-sided test, then in frequentist statistics, the p-value, or exact significance level, is the probability of an observation of random values X to be at least as high as the observed value x, given the null: p(x x θ = θ 0 ). In we would instead focus on posterior probability p 0 = p(θ θ 0 X = x) = Φ((µ 0 x)τ). As it turns out, p(x x θ = θ 0 ) = 1 Φ((x θ 0 )τ) = Φ((θ 0 x)τ) = p 0. Note that this implies that the p-value is (1 + B 1 ) 1. (Lee, 2012, 144)

37 Likelihood principle The likelihood principle states that all the information about the parameters from the data are stored in the likelihood function. This principle generally holds in, but is violated in, among other situations: frequentist statistical significance tests or confidence intervals; the use of Jeffreys priors.

38 Friel, Nial Analysis. lecture slides. Gamerman, Dani and Hedibert F. Lopes Markov Chain Monte Carlo. Stochastic simulation for. 2nd ed. Boca Raton, FL: Chapman & Hall. Gill, Jeff Methods. A social and behavioral sciences approach. 3rd ed. Boca Raton: CRC Press. Jeffreys, Sir Harold Theory of Probability. 3rd ed. Clarendon Press. Lancaster, Tony An Introduction to Modern Econometrics. Malden, MA: Blackwell. Lee, Peter M Statistics: An introduction. 4th ed. Chichester: Wiley.

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Bayesian Ingredients. Hedibert Freitas Lopes

Bayesian Ingredients. Hedibert Freitas Lopes Normal Prior s Ingredients Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Predictive Distributions

Predictive Distributions Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

Introduc)on to Bayesian Methods

Introduc)on to Bayesian Methods Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bayesian vs frequentist techniques for the analysis of binary outcome data 1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly

More information

Other Noninformative Priors

Other Noninformative Priors Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector

More information

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM Eco517 Fall 2004 C. Sims MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM 1. SOMETHING WE SHOULD ALREADY HAVE MENTIONED A t n (µ, Σ) distribution converges, as n, to a N(µ, Σ). Consider the univariate

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010 Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Bayesian analysis in nuclear physics

Bayesian analysis in nuclear physics Bayesian analysis in nuclear physics Ken Hanson T-16, Nuclear Physics; Theoretical Division Los Alamos National Laboratory Tutorials presented at LANSCE Los Alamos Neutron Scattering Center July 25 August

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information