Bayesian Inference: Concept and Practice
|
|
- Suzan Singleton
- 5 years ago
- Views:
Transcription
1 Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017
2 1 2 3
3 Bayes theorem In order to estimate the parameters of a model, it is natural to ask, given the data: what probability distribution can we assign to the parameters? p(θ X) =? From Bayes theorem we have: p(θ X)p(X) = p(x θ)p(θ) p(θ X) = p(x θ)p(θ) p(x) = p(x θ)p(θ) p(x θ)p(θ)dθ We refer to p(θ X) as the posterior and p(θ) as the prior distribution of θ. is therefore updating our prior believes given some new data.
4 Prior believes Where do these prior believes come from? Purely subjective starting point. Earlier scientific findings. Earlier data in the same endeavour. Based on skeptical or optimist views about what should be found. But also... Priors can be vague, low on information, so as not to drive results. Priors can be improper, so that they are not real probability densities, but really low on information. Priors can be data-driven.
5 Likelihood function p(θ X) = p(x θ)p(θ) p(x) While p(x θ) is a probability distribution of X over possible values of θ, it can also be seen as a function of probabilities of θ over possible values of X. In this case it is not a proper probability density function, as it does not necessarily add up to 1, and it is called a likelihood function. L(θ X) = p(x θ). It is often easier to work with the log-likelihood l(θ X) = log L(θ X). (dice example) (Lee, 2012, 37)
6 Frequentist vs. Bayes Note that in, we take θ to be a random variable, with an associated probability distribution, and we try to describe this distribution. In frequentist, we take instead X as a random variable, with an associated probability distribution parameterised by θ, and find the θ that maximises this distribution. Since p X (X θ) p θ (X θ) = L(θ X), this amounts to finding the θ that maximises the (log-)likelihood function. We return to the comparison with frequentist statistics when we turn to.
7 Predictive distribution p(θ X) = p(x θ)p(θ) p(x) The normalising constant p(x) can be taken as the marginal distribution p(x) = p(x θ)p(θ)dθ, which is also called the predictive distribution, as it is our prediction of X taking into account our uncertainty around θ as well as around X given θ. In practice, there are many circumstances where we do not need to concern ourselves with this constant, focusing instead on p(θ X) p(x θ)p(θ). (Lee, 2012, 39)
8 posterior likelihood prior
9 Parametric Note that is parametric all on parameters is given a particular type of probability distribution. If we have no idea about the type of distribution that describes p(x θ), can be problematic. There is such a thing as semiparametric or non-parametric, which is beyond the scope of today.
10 Normal distribution (1 observation, unknown µ) In some circumstances, has an analytic solution. For example, assume we have one observation (x) from a normally distributed random variable with mean µ and variance σ 2, X N(µ, σ 2 ), and σ 2 is known. p(x) = ( 2πσ 2) ( ) 1 2 (x µ) 2 exp 2σ 2 Suppose we have a prior belief regarding the mean that follows a normal distribution: µ N(µ 0, σ 2 µ 0 ). p(µ) = ( 2πσ 2 µ 0 ) 1 2 exp ( (µ µ0 ) 2 ) 2σ 2 µ 0 (Lee, 2012, 40)
11 Normal distribution (1 observation, unknown µ) We can now update our prior believe as follows: p(µ x) p(x µ)p(µ) = ( 2πσ 2) ( 1 2 (x µ) 2 exp 2σ 2 ( ( ) 2πσ (µ µ0 ) 2 µ0 exp 2σ 2 µ 0 ) ), which can be worked out as: ( p(µ x) = (2πσµ 2 1 ) 1 2 exp (µ µ 1) 2 ) 2σµ 2, 1 which implies µ x N(µ 1, σµ 2 1 ), ( ) where µ 1 = σµ 2 µ0 1 + x and σ 2 σµ 2 0 σ 2 µ 1 = ( σµ σµ 2 ) 1. (Lee, 2012, 40 41)
12 Normal distribution (1 observation, unknown µ) Observation x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 2) p(µ) prior posterior µ
13 Normal distribution (1 observation, unknown µ) Observation x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 5) p(µ) prior posterior µ
14 Normal distribution (1 observation, unknown µ) statisticians typically work with the precision instead of the variance: τ = ( σ 2) 1, so that σ 2 µ1 = ( σµ σµ 2 ) 1 simplifies to τ 1 = τ 0 + τ, or posterior precision equals prior precision plus likelihood precision (in the case of a normal prior and normal likelihood). ( µ 1 = σµ 2 µ0 1 σµ 2 + x ) τ 0 0 σ 2 = µ 0 τ 0 + τ + x τ τ 0 + τ, which is therefore a weighted mean between prior mean and the data, weighted by their precision. Note that this still assumes σ 2, or τ, is known. (Lee, 2012, 41)
15 Normal distribution (n observations, unknown µ) When we do not have only one, but n observations from the same random variable, we obtain: τ 1 = τ 0 + nτ ( µ 1 = τ1 1 µ 0 τ 0 + x i τ i = τ1 1 (µ 0 τ 0 + n xτ) Note that we would obtain the same result if we updated the prior using the first observation, then take the posterior as the prior using the second observation, and so forth, which is called online updating. ) (Lee, 2012, 45)
16 Normal distribution (n observations, unknown µ) 5 observations with x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 10) p(µ) prior posterior µ
17 Normal distribution (n observations, unknown µ) 100 observations with x = 2, while σ 2 = 1 is known. Prior distribution µ x N(0, 10) p(µ) prior posterior µ
18 Normal distribution (n observations, unknown µ and τ) Similar analytical solutions exist when both µ and τ are unknown, and given both normal priors, or other (improper) priors, such as uniform ones. The basic principles are more important: The posterior is a combination of prior information and information from the data. The more data, and the less variable this data, the more the posterior is driven by the data instead of the prior.
19 1 2 3
20 The linear model is familiar from ordinary least squares (OLS) estimation: y i = β 0 + β 1 x i1 + β 2 x i β k x ik + ε i or y = Xβ + ε, whereby ε N(0, σε). 2 In, the quantity of interest is then: p(β y, X) p(y, X β)p(β), whereby we assumed that the errors and X are independent, p(ε, X) = p(ε)p(x). (Lancaster, 2004, )
21 We assume a (hierarchical) prior for the errors, assuming that these are normally and independently distributed with mean zero and precision τ (τ = 1/σε): 2 p(ε τ) τ n 2 exp ( τ ) 2 ε ε, from which follows: p(y, X β, τ) τ n 2 exp ( τ ) 2 (y Xβ) (y Xβ) p(x β). If we assume X is strictly exogenous and does not depend on β or τ, we obtain: p(y, X β, τ) = p(y X, β) τ n 2 exp ( τ ) 2 (y Xβ) (y Xβ). (Lancaster, 2004, 117)
22 : improper prior There is a conventional, vague, improper uniform distribution for β, and τ, namely whereby < β <, τ > 0. p(β, τ) 1 τ, Note that this prior is improper as it is not an actual probability distribution it does not add to one. To obtain the posterior, we multiply the likelihood by the prior: ( p(β, τ y, X) τ n 2 1 exp τ ) 2 (y Xβ) (y Xβ). (Lancaster, 2004, 120)
23 : improper prior The marginal posterior distribution of β in this case is: β t(b, s 2 (X X) 1, ν), a t-distribution with ν = n k degrees of freedom, where b = (X X) 1 X y is the OLS estimate of β and s 2 (X X) 1 the OLS estimate of the variance-covariance matrix. s 2 = e e ν is the residual variance. The use of the improper prior made this tractable analytically and shows the importance of least squares also in. (Lancaster, 2004, 124, 133)
24 : improper prior The marginal posterior distribution of τ in this case is ) p(τ yx) τ ν 2 1 exp ( τνs2, 2 a gamma distribution. The expected value then works out to be E(τ) = 1/s 2, which is not surprising given that the precision is the reciprocal of the variance. (Lancaster, 2004, )
25 : conjugate prior A conjugate prior is a prior that is of the same family of probability distributions as the posterior, such that a posterior based on one set of observations can be used immediately as prior for a subsequent set of observations. The natural conjugate priors in this case are β τ N(β 0, τ 1 A 1 ) and τ G(α, c) with density ( p(β, τ) τ α 2 1 exp τ ) ( 2 (β β 0) A(β β 0 ) exp Compare this to the likelihood: p(y, X β, τ) τ n 2 exp ( τ ) 2 (β b) X X(β b) exp τc 2 ). ( ) τe e, 2 where b is the least squares estimate and e the associated residuals. (Lancaster 2004, 133; Gamerman and Lopes 2006, 55 56)
26 : conjugate prior Using this prior, we obtain an expected value of the slope parameters of E(β y, X) = (X X + A) 1 (X Xb + Aβ 0 ), which can be seen as a weighted average between the prior parameter β 0 and the least squares estimate b, weighted by their respective precisions. This prior is therefore much more informative, it affects the results more strongly than the flat improper prior. (Lancaster, 2004, )
27 Jeffreys priors One well-known proposal to obtain non-informative, improper priors is to use a prior where I (θ X) = E p(θ) I (θ X) 1 2, [ ] 2 log f (X θ) θ θ θ is the expectation of the Fisher information matrix of θ. In particular, this leads to improper priors that remain improper under reparameterisation of the model, which is not necessarily the case otherwise. Typically, this leads to priors of p(θ) k for location parameters and p(τ) τ 1 for scale parameters. (Gamerman and Lopes, 2006, 45 46)
28 Power priors Another special type of prior is the power prior, where the idea is to put a relative weight on prior data and newly obtained data, which itself can be configured. Say, we collect data X, but an earlier study had already collected data X 0 and provided posterior estimates. We can then use the prior p(θ X 0, a 0 ) p(θ) [L(θ X 0 )] a 0, where 0 a 0 1. So our posterior becomes p(θ X, X 0, a 0 ) p(x θ)p(θ X 0, a 0 ). In typical style, we can also put a prior on this parameter: p(θ X 0, a 0 ) = 1 0 p(θ) [L(θ X 0 )] a 0 p(a 0 ψ)da 0, with ψ a hyperparameter of some distribution of a 0, e.g. gamma. (Gill, 2015, )
29 1 2 3
30 Hypothesis Assume we have an unknown parameters θ, which we know to be from a set Θ. In frequentist statistics we usually have a null H 0 : θ Θ 0 and an alternative H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 =. (Lee, 2012, 138) For example, we might have a parameter β, with H 0 : β = 0 and H 1 : β 0.
31 Frequentist The probability of observing any given value of β is infinitely small, so in frequentist statistics we reject H 0 if, assuming the sampling distribution given H 0, the probability of observing x or greater is less than threshold value α. Note that the p-value is not the probability that the null or the alternative is true. As n increases, p decreases. The α-value of 0.05 is arbitrary. The null and alternative hypotheses are often unrealistic, and typically strongly biased in favour of one over the other. The p-value is based on the probability of observations we have not made. (Friel 2015, lecture 8; Lee 2012, 139)
32 Frequentist Harold Jeffreys remarks: What the use of p implies, therefore, is that a that may be true may be rejected because it has not predicted observable results that have not occurred. (Jeffreys 1967, as cited in Lee 2012, 139) Andrew Gelman remarks: The relevant goal is not to ask the question Do the data come from the assumed model? (to which the answer is almost always no), but to quantify the discrepancies between data and model, and assess whether they could have arisen by chance, under the model s own assumptions. (Friel, 2015, lecture 8)
33 : Bayes factor As usual, in we attempt to calculate full probability distributions more directly. So we are interested in p 0 = p(θ Θ 0 X) and p 1 = p(θ Θ 1 X), the posterior probabilities of the two hypotheses. And of course, we need priors. π 0 = p(θ Θ 0 ) and π 1 = p(θ Θ 1 ) We then focus on the odds of H 0 against H 1, the ratio p 0 p 1 consider the Bayes factor and B = p 0/p 1 π 0 /π 1 = p 0π 1 p 1 π 0, i.e. the odds in favour of H 0 against H 1 given the data. (Lee, 2012, )
34 : likelihood ratio If θ could only have two possible values, θ 0 and θ 1, then p 0 π 0 p(x θ 0 ) and p 1 π 1 p(x θ 1 ), and therefore The Bayes factor is then p 0 p 1 = π 0 π 1 p(x θ 0 ) p(x θ 1 ). B = p(x θ 0) p(x θ 1 ), which is the likelihood ratio of H 0 against H 1. When θ can have more values, we need to integrate over all possible values: 1 θ Θ B = 0 π 0 p(x θ)p(θ)dθ 1 θ Θ 1 π 1 p(x θ)p(θ)dθ. (Lee, 2012, 141)
35 : critical values Much like the α-value for judging p-values, using thresholds is generally inappropriate. Nevertheless, Jeffreys suggests the following: B > 1 Support for H 0 1 > B minimal evidence against H > B 10 1 substantial evidence against H > B 10 2 strong evidence against H 0 B < 10 2 decisive evidence against H 0 And one can translate against H 0 into in favour of H 1 in a way that you cannot in frequentist conclusions. Note that calculating B requires the normalising constant p(x), which is often difficult to obtain. B is also more sensitive to the priors than most other s. (Gill, 2015, 217)
36 p-values Assume a one-sided test, then in frequentist statistics, the p-value, or exact significance level, is the probability of an observation of random values X to be at least as high as the observed value x, given the null: p(x x θ = θ 0 ). In we would instead focus on posterior probability p 0 = p(θ θ 0 X = x) = Φ((µ 0 x)τ). As it turns out, p(x x θ = θ 0 ) = 1 Φ((x θ 0 )τ) = Φ((θ 0 x)τ) = p 0. Note that this implies that the p-value is (1 + B 1 ) 1. (Lee, 2012, 144)
37 Likelihood principle The likelihood principle states that all the information about the parameters from the data are stored in the likelihood function. This principle generally holds in, but is violated in, among other situations: frequentist statistical significance tests or confidence intervals; the use of Jeffreys priors.
38 Friel, Nial Analysis. lecture slides. Gamerman, Dani and Hedibert F. Lopes Markov Chain Monte Carlo. Stochastic simulation for. 2nd ed. Boca Raton, FL: Chapman & Hall. Gill, Jeff Methods. A social and behavioral sciences approach. 3rd ed. Boca Raton: CRC Press. Jeffreys, Sir Harold Theory of Probability. 3rd ed. Clarendon Press. Lancaster, Tony An Introduction to Modern Econometrics. Malden, MA: Blackwell. Lee, Peter M Statistics: An introduction. 4th ed. Chichester: Wiley.
Bayesian Inference. Chapter 1. Introduction and basic concepts
Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master
More informationThe Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016
The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationBayesian Ingredients. Hedibert Freitas Lopes
Normal Prior s Ingredients Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationPredictive Distributions
Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationIntroduction to Applied Bayesian Modeling. ICPSR Day 4
Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationThe Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13
The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty
More informationChapter 4 - Fundamentals of spatial processes Lecture notes
Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationBayesian vs frequentist techniques for the analysis of binary outcome data
1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly
More informationOther Noninformative Priors
Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationIntroduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters
Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector
More informationMAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM
Eco517 Fall 2004 C. Sims MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM 1. SOMETHING WE SHOULD ALREADY HAVE MENTIONED A t n (µ, Σ) distribution converges, as n, to a N(µ, Σ). Consider the univariate
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationA Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.
A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationBayesian Inference: Probit and Linear Probability Models
Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationStatistical Methods in Particle Physics Lecture 1: Bayesian methods
Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationBayesian Inference. Introduction
Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,
More informationStat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010
Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationA Discussion of the Bayesian Approach
A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns
More informationLECTURE 5. Introduction to Econometrics. Hypothesis testing
LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationAST 418/518 Instrumentation and Statistics
AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,
More informationVector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.
Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series
More informationA BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain
A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationBayesian analysis in nuclear physics
Bayesian analysis in nuclear physics Ken Hanson T-16, Nuclear Physics; Theoretical Division Los Alamos National Laboratory Tutorials presented at LANSCE Los Alamos Neutron Scattering Center July 25 August
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More information