The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016
|
|
- Raymond Goodwin
- 5 years ago
- Views:
Transcription
1 The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016
2 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods Formulate prior Combine prior and likelihood to compute posterior Model comparison Main reading: Ch.2 in Gary Koop s Bayesian Econometrics
3 Recap
4 Bayes Rule Deriving Bayes Rule Symmetrically Implying or p (A, B) = p(a B)p(B) p (A, B) = p(b A)p(A) p(a B)p(B) = p(b A)p(A) p(a B) = which is known as Bayes Rule. p(b A)p(A) p(b)
5 Data and parameters The purpose of Bayesian analysis is to use the data y to learn about the parameters θ Parameters of a statistical model Or anything not directly observed Replace A and B in Bayes rule with θ and y to get p(θ y) = p(y θ)p(θ) p(y) The probability density p(θ y) then describes what do we know about θ, given the data.
6 The posterior density The posterior density is often the object of fundamental interest in a Bayesian estimation. The posterior is the result. We are interested in the entire conditional distribution of the parameters θ p(θ y) = p(y θ)p(θ) p(y) Today we will start filling in blanks and specify a prior and data generating process (i.e. a likelihood function) in order to find the posterior.
7 The linear regression model
8 The linear regression model We will start with something very basic The linear regression model y i = βx i + ε i N(0, σ 2 ) i = 1, 2,...N The variable x i is fixed (i.e. not random) or independent of ε i and with a probability density function p(x i λ) where the parameter(s) λ does not include β or σ 2 Walk before we run etc...
9 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
10 The likelihood function
11 Defining the likelihood function The assumptions made about the linear regression model implies that p ( y i β, σ 2) is Normal. p ( [ ] y i β, σ 2) 1 = exp (y i βx i ) 2 2πσ 2 2σ 2 or that p ( y β, σ 2) = 1 (2π) N 2 σ N exp [ 1 2σ 2 ] N (y i βx i ) 2 i=1 where y [ y 1 y 2 y N ]
12 Reformulating the likelihood function It will be convenient to use that where N i=1 ( (y i βx i ) 2 = vs 2 + β β ) N 2 v = N 1 xi y i β = x 2 i i=1 x 2 i s 2 = (yi βx i ) 2 v
13 Reformulating the likelihood function The likelihood function can then be written as p ( [ y β, σ 2) = {h 1 12 exp h ( β (2π) N 2 2 β ) N ]} 2 xi 2 i=1 [ {h v2 exp hv ]} 2s 2 where h σ 2.
14 Formulating a prior
15 The Prior density The prior density should reflect information about θ that we have prior to observing y. Priors should preferably be easy to interpret be of a form that facilitates computing the posterior Natural conjugate priors have both of these advantages.
16 Conjugate priors Conjugate priors, when combined with the likelihood function, result in posteriors that are of the same family of distributions Natural conjugate priors has the same functional form as the likelihood function The posterior can p(θ y) then be thought of as being the result of two sub-samples p(θ y) p(y θ)p(θ) The sufficient statistic about θ of the first sub-sample is described by the hyper-parameters of the prior The information about θ in second sub-sample can be described by the sufficient statistic for the actual observed sample (i.e. y) This feature makes natural conjugate priors easy to interpret
17 Specifying a prior density: It s your choice For the Normal regression model we need to elicit, i.e. formulate, a prior p(θ) for β and h which we can denote p(β, h). It will be convenient to use that p(β, h) = p(β h)p(h) and let β h N(β, h 1 V ) and let h G ( s 2, v ). The natural conjugate prior for β and h is then denoted β, h NG ( β, V, s 2, v ) Notation: Koop uses bars below parameters to denote hyper-parameters of a prior, and bars above to denote hyper-parameters of a posterior.
18 What makes a prior conjugate? Short answer: Convenient functional forms Long answer: Remember the likelihood function p ( [ y β, σ 2) = {h 1 12 exp h ( β (2π) N 2 2 β ) 2 N i=1 x 2 i ]} {h v2 exp [ hv 2s 2 and consider the normal-gamma prior distribution p (β h) p(h) where β h N(β, h 1 V ) and h G ( s 2, v ). We then have { ( ) 2 ]} 1 β β p (β h) p(h) = [ 2πh 1 V exp 2h 1 V { ( c 1 G h v 2 2 exp hv )} 2s 2 Note the similar functional forms. ]}
19 The posterior
20 The Normal-Gamma posterior It is possible to use Bayes rule p(β, h y) = p(y β, h)p (β h) p(h) p(y) to find an analytical expression for the posterior that is Normal-Gamma { [ ( ) 2 ]} { ( 1 β β p(β, h y) = exp c 1 G 2πV 2V h v 2 2 exp hv )} 2s 2 i.e. the same form as the prior β, h y NG ( β, V, s 2, v ) β, h NG ( β, V, s 2, v )
21 The posterior The (hyper) parameters of the posterior β, h y NG ( β, V, s 2, v ) are a combination of the (hyper) parameters of the prior and sample information 1 V = V 1 + xi 2 ( β = V V 1 β + β ) xi 2 v = v + N ( β β ) 2 vs 2 = vs 2 + vs 2 + V + xi 2
22 Posterior marginal distribution of β In regression analysis, we are often interested in the mean and the variance of the coefficient β. The mean can be found by integrating the posterior w.r.t. h and β E (β y) = βp(β, h y)dhdβ = βp(β y)dβ The density p(β y) is called the marginal distribution of β. It can be shown that β y t ( β, s 2 V, v ) so that E (β y) = β var (β y) = vs2 v 2 V
23 Combining prior and sample information The posterior mean of β is given by ( β = V V 1 β + β ) xi 2 V = 1 V 1 + x 2 i Do these expressions make sense? V 1 is the precision of the prior x 2 i is the precision of the data What happens when V 1 = 0? when V 0?
24 Example
25 Simple empirical example Consider a sample of 50 observations generated from the model y i = βx i + ε i N(0, σ 2 ) with β = 2, σ 2 = 1 and x i N(0, 1) Set the prior hyper parameters in β, h NG ( β, V, s 2, v ) to β = 1.5 V = 0.25 s 2 = 1 v = 10
26 The data y x
27 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
28 Change the prior mean β = 2.5 V = 0.25 s 2 = 1 v = 10
29 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
30 Change the prior precision β = 1.5 V = 0.5 s 2 = 1 v = 10
31 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
32 Change the prior mean of variance β = 1.5 V = 0.25 s 2 = 10 v = 10
33 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
34 Change the prior degrees of freedom β = 1.5 V = 0.25 s 2 = 1 v = 100
35 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
36 More data: N=100
37 y x
38 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
39 Even more data:n=500
40 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
41 Prior precise but with β far from βb
42 Figure 2.1: Marginal Prior and Posteriors for β Prior Posterior Likelihood probability density β
43 Noninformative prior Non-informative prior V 1 = 0 v = 0 V = 1 x 2 i β = β v = N vs 2 = vs 2 which are the same as the OLS estimates. But this prior is so-called improper, i.e. the density p(β, h) does not integrate to 1 p(β, h)dβdh = 1
44 Model comparison We may have two models that could both potentially explain y y i = βx ji + ε ji N(0, h 1 j ) j = 1, 2 For model comparison, it is important that the models explain the same data Improper priors can be used only for parameters shared by both models
45 Model comparison We can formulate priors and compute posterior just as before ( ) Prior: β j, h j NG β j, V j, s 2 j, v j ( ) Posterior: β j, h j y NG β j, V j, s 2 j, v j Again, the (hyper) parameters of the posterior are a combination of the (hyper) parameters of the prior and sample information V j = V 1 j 1 + xji 2 ( β j = V j V 1 j β j + β ) j x 2 ji v j = v j + N ( βj β j ) 2 v j s 2 j = v j s 2 j + v j sj 2 + V j + xji 2
46 Posterior odds ratio The posterior odds ratio is the relative probabilities of two models conditional on the data PO 12 p(m 1 y) p(m 2 y) = p(y M 1)p(M 1 ) p(y M 2 )p(m 2 ) The prior p(m j ) model probabilities are formulated before seeing the data. The marginal likelihood p(y M j ) is calculated as p(y M j ) = p(y β, h)p(β j, h j )dh j dβ j
47 The marginal likelihood for the Normal-Gamma model The marginal likelihood for the Normal-Gamma model is given by where p(y M j ) = c j ( Γ c j = V j V j ( ) ( v j 2 and Γ( ) is the Gamma function. Γ ( vj 2 ) 1 2 ( v j s 2 j ) v j 2 v j s 2 j ) π N 2 v ) j 2
48 Posterior odds ratio The posterior odds ratio is then given by PO 12 = ( ) 1 c V 1 2 ( 1 V v 1 s 2 v ) ( ) 1 c V 2 2 ( 2 V v 2 s 2 v ) The posterior odds ratio rewards fitting the data via the term v j s 2 j which contains the sum of squared errors v j s 2 j coherence between prior and OLS estimate via the term v j s 2 j which contains (β j β ) 2 j When models have different number of parameters, the posterior odds ratio also rewards parsimony, i.e. the model with fewer parameters.
49 That s it for today.
Introduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationOther Noninformative Priors
Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationOne Parameter Models
One Parameter Models p. 1/2 One Parameter Models September 22, 2010 Reading: Hoff Chapter 3 One Parameter Models p. 2/2 Highest Posterior Density Regions Find Θ 1 α = {θ : p(θ Y ) h α } such that P (θ
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationDavid Giles Bayesian Econometrics
9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationBayesian Methods for DSGE models Lecture 1 Macro models as data generating processes
Bayesian Methods for DSGE models Lecture 1 Macro models as data generating processes Kristoffer Nimark CREI, Universitat Pompeu Fabra and Barcelona GSE June 30, 2014 Bayesian Methods for DSGE models Course
More informationModule 11: Linear Regression. Rebecca C. Steorts
Module 11: Linear Regression Rebecca C. Steorts Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationWageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice
Wageningen Summer School in Econometrics The Bayesian Approach in Theory and Practice September 2008 Slides for Lecture on Qualitative and Limited Dependent Variable Models Gary Koop, University of Strathclyde
More informationVector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.
Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationBayesian Inference: Probit and Linear Probability Models
Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationLecture Notes based on Koop (2003) Bayesian Econometrics
Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout
More informationSTAT J535: Introduction
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 Chapter 1: Introduction to Bayesian Data Analysis Bayesian statistical inference uses Bayes Law (Bayes Theorem) to combine prior information
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationBayesian Model Averaging
Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationDynamic models 1 Kalman filters, linearization,
Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationSemantics for Probabilistic Programming
Semantics for Probabilistic Programming Chris Heunen 1 / 21 Bayes law P(A B) = P(B A) P(A) P(B) 2 / 21 Bayes law P(A B) = P(B A) P(A) P(B) Bayesian reasoning: predict future, based on model and prior evidence
More informationBayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK)
Bayesian Methods for Multivariate Categorical Data Jon Forster (University of Southampton, UK) Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationKatsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract
Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationThe binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.
The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationStats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions
Stats 579 Intermediate Bayesian Modeling Assignment # 2 Solutions 1. Let w Gy) with y a vector having density fy θ) and G having a differentiable inverse function. Find the density of w in general and
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationSTAT J535: Chapter 5: Classes of Bayesian Priors
STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationThe joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More information