Introduction to Bayesian Inference

Similar documents
Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Regression Linear and Logistic Regression

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian Learning (II)

Density Estimation. Seungjin Choi

GAUSSIAN PROCESS REGRESSION

David Giles Bayesian Econometrics

Bayesian Computations for DSGE Models

Principles of Bayesian Inference

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Hierarchical Modeling for Spatial Data

Parametric Techniques Lecture 3

10. Exchangeability and hierarchical models Objective. Recommended reading

Expectation Propagation for Approximate Bayesian Inference

Principles of Bayesian Inference

Prelim Examination. Friday August 11, Time limit: 150 minutes

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayesian Decision and Bayesian Learning

Choosing among models

Lecture : Probabilistic Machine Learning

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayesian Machine Learning

Introduction to Bayesian Computation

STA 414/2104, Spring 2014, Practice Problem Set #1

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian Deep Learning

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian Model Comparison:

Sequential Monte Carlo Methods

Parametric Techniques

CPSC 540: Machine Learning

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

STA414/2104 Statistical Methods for Machine Learning II

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Principles of Bayesian Inference

Principles of Bayesian Inference

GWAS IV: Bayesian linear (variance component) models

Bayesian Inference and MCMC

Hierarchical Models & Bayesian Model Selection

Sequential Monte Carlo Methods (for DSGE Models)

Introduction to Machine Learning

Lecture 1: Bayesian Framework Basics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

A Very Brief Summary of Bayesian Inference, and Examples

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Loss Function Estimation of Forecasting Models A Bayesian Perspective

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

David Giles Bayesian Econometrics

Time Series and Dynamic Models

Introduction into Bayesian statistics

Introduction to Probabilistic Machine Learning

Bayesian RL Seminar. Chris Mansley September 9, 2008

Lecture 7 and 8: Markov Chain Monte Carlo

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Sequential Monte Carlo Methods (for DSGE Models)

Bayesian model selection: methodology, computation and applications

A Bayesian Treatment of Linear Gaussian Regression

DSGE Model Forecasting

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

One-parameter models

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

PMR Learning as Inference

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

INTRODUCTION TO BAYESIAN ANALYSIS

Introduction to Machine Learning

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

g-priors for Linear Regression

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Short Questions (Do two out of three) 15 points each

Intelligent Systems I

Riemann Manifold Methods in Bayesian Statistics

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Harrison B. Prosper. Bari Lectures

Markov Chain Monte Carlo

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sequential Monte Carlo Methods

Penalized Loss functions for Bayesian Model Choice

Predictive Distributions

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Learning Bayesian network : Given structure and completely observed data

BAYESIAN ESTIMATION OF UNKNOWN PARAMETERS OVER NETWORKS

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Part 4: Multi-parameter and normal models

Naïve Bayes classification

General Bayesian Inference I

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Bayesian Methods: Naïve Bayes

Bayesian Analysis (Optional)

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

COM336: Neural Computing

Probabilistic Reasoning in Deep Learning

Transcription:

University of Pennsylvania EABCN Training School May 10, 2016

Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ Bayes Theorem: p(φ Y ) = p(y φ)p(φ) p(y )

Linear Regression / AR Models Consider AR(1) model: y t = y t 1 φ + u t, u t iidn(0, 1). Let x t = y t 1. Write as or y t = x tφ + u t, u t iidn(0, 1), Y = X φ + U. We can easily allow for multiple regressors. Assume φ is k 1. Notice: we treat the variance of the errors as know. The generalization to unknown variance is straightforward but tedious. Likelihood function: p(y φ) = (2π) T /2 exp { 1 } 2 (Y X φ) (Y X φ).

A Convenient Prior Prior: φ N ) { (0 k 1, τ 2 I k k, p(φ) = (2πτ 2 ) k/2 exp 1 } 2τ 2 φ φ Large τ means diffuse prior. Small τ means tight prior.

Deriving the Posterior Bayes Theorem: p(φ Y ) p(y φ)p(φ) { exp 1 } 2 [(Y X φ) (Y X φ) + τ 2 φ φ]. Guess: what if φ Y N( φ T, V T ). Then { p(θ Y ) exp 1 } 2 (φ φ T ) 1 V T (φ φ T ). Rewrite exponential term Y Y φ X Y Y X φ + φ X X φ + τ 2 φ φ = Y Y φ X Y Y X φ + φ (X X + τ 2 I)φ ( ) ( ) = φ (X X + τ 2 I) 1 X Y X X + τ 2 I ( ) φ (X X + τ 2 I) 1 X Y +Y Y Y X (X X + τ 2 I) 1 X Y.

Deriving the Posterior Exponential term is a quadratic function of φ. Deduce: posterior distribution of φ must be a multivariate normal distribution φ Y N( φ T, V T ) with φ T = (X X + τ 2 I) 1 X Y τ : τ 0: V T = (X X + τ 2 I) 1. φ Y approx ( ) N ˆφ mle, (X X ) 1. φ Y approx Pointmass at 0

Marginal Data Density Plays an important role in Bayesian model selection and averaging. Write p(y θ)p(θ) p(y ) = p(θ Y ) { = exp 1 } 2 [Y Y Y X (X X + τ 2 I) 1 X Y ] (2π) T /2 I + τ 2 X X 1/2. The exponential term measures the goodness-of-fit. I + τ 2 X X is a penalty for model complexity.

Posterior We will often abbreviate posterior distributions p(φ Y ) by π(φ) and posterior expectations of h(φ) by E π [h] = E π [h(φ)] = h(φ)π(φ)dφ = h(φ)p(φ Y )dφ. We will focus on algorithms that generate draws {φ i } N i=1 from posterior distributions of parameters in time series models. These draws can then be transformed into objects of interest, h(φ i ), and under suitable conditions a Monte Carlo average of the form h N = 1 N N h(φ i ) E π [h]. i=1 Strong law of large numbers (SLLN), central limit theorem (CLT)...

Direct Sampling In the simple linear regression model with Gaussian posterior it is possible to sample directly. For i = 1 to N, draw φ i from N ( φ, Vφ ). Provided that V π [h(φ)] < we can deduce from Kolmogorov s SLLN and the Lindeberg-Levy CLT that a.s. h N E π [h] N ( h N E π [h] ) = N ( 0, V π [h(φ)] ).

Decision Making The posterior expected loss associated with a decision δ( ) is given by ρ ( δ( ) Y ) = L ( θ, δ(y ) ) p(θ Y )dθ. Θ A Bayes decision is a decision that minimizes the posterior expected loss: δ (Y ) = argmin d ρ ( δ( ) Y ). Since in most applications it is not feasible to derive the posterior expected risk analytically, we replace ρ ( δ( ) Y ) by a Monte Carlo approximation of the form ρ N ( δ( ) Y ) = 1 N N L ( θ i, δ( ) ). i=1 A numerical approximation to the Bayes decision δ ( ) is then given by δ N(Y ) = argmin d ρ N ( δ( ) Y ).

Inference Point estimation: Quadratic loss: posterior mean Absolute error loss: posterior median Interval/Set estimation P π {θ C(Y )} = 1 α: highest posterior density sets equal-tail-probability intervals

Forecasting Example: h 1 y T +h = θ h y T + θ s u T +h s s=0 h-step ahead conditional distribution: y T +h (Y 1:T, θ) N (θ h y T, 1 ) θh. 1 θ Posterior predictive distribution: p(y T +h Y 1:T ) = p(y T +h y T, θ)p(θ Y 1:T )dθ. For each draw θ i from the posterior distribution p(θ Y 1:T ) sample a sequence of innovations u i T +1,..., ui T +h and compute y i T +h as a function of θ i, u i T +1,..., ui T +h, and Y 1:T.

Model Uncertainty Assign prior probabilities γ j,0 to models M j, j = 1,..., J. Posterior model probabilities are given by γ j,t = γ j,0 p(y M j ) J j=1 γ j,0p(y M j ), where p(y M j ) = p(y θ (j), M j )p(θ (j) M j )dθ (j) Log marginal data densities are one-step-ahead predictive scores: ln p(y M j ) T = ln p(y t θ (j), Y 1:t 1, M j )p(θ (j) Y 1:t 1, M j )dθ (j). t=1 Model averaging: J p(h Y ) = γ j,t p(h j (θ (j) ) Y, M j ). j=1