F denotes cumulative density. denotes probability density function; (.)

Similar documents
Lecture 8: Bayesian Estimation of Parameters in State Space Models

Bayesian Inference and MCMC

Likelihood-free MCMC

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Example: Ground Motion Attenuation

Introduction to Machine Learning

The Unscented Particle Filter

CPSC 540: Machine Learning

Bayesian Methods for Machine Learning

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Probability and Estimation. Alan Moses

Part 6: Multivariate Normal and Linear Models

Answers and expectations

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Probabilistic Graphical Models

Bayesian data analysis in practice: Three simple examples

The Metropolis-Hastings Algorithm. June 8, 2012

4. Distributions of Functions of Random Variables

Statistical Inference and Methods

Computational statistics

Directional Metropolis Hastings algorithms on hyperplanes

Bayesian Estimation with Sparse Grids

Markov Chain Monte Carlo, Numerical Integration

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Monte Carlo Methods. Leon Gu CSD, CMU

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Lecture 8: The Metropolis-Hastings Algorithm

Making rating curves - the Bayesian approach

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 1: Introduction

Statistics & Data Sciences: First Year Prelim Exam May 2018

CSC 2541: Bayesian Methods for Machine Learning

Dynamic System Identification using HDMR-Bayesian Technique

Introduction to Machine Learning CMU-10701

Bayesian Phylogenetics:

Bayesian Regression Linear and Logistic Regression

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian GLMs and Metropolis-Hastings Algorithm

Forward Problems and their Inverse Solutions

Acceptance probability of IP-MCMC-PF: revisited

Lecture 7 and 8: Markov Chain Monte Carlo

16 : Markov Chain Monte Carlo (MCMC)

Learning the hyper-parameters. Luca Martino

Bayesian Linear Regression

STA 4273H: Sta-s-cal Machine Learning

Brief introduction to Markov Chain Monte Carlo

For final project discussion every afternoon Mark and I will be available

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Estimation of Input Output Tables for Russia

Computer Practical: Metropolis-Hastings-based MCMC

CS281A/Stat241A Lecture 22

Session 3A: Markov chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC)

STA 294: Stochastic Processes & Bayesian Nonparametrics

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Advanced Statistical Modelling

Stochastic Simulation Variance reduction methods Bo Friis Nielsen

Extended Object and Group Tracking with Elliptic Random Hypersurface Models

Bayesian Inference and an Intro to Monte Carlo Methods

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Particle Filters. Outline

Spring 2006: Introduction to Markov Chain Monte Carlo (MCMC)

Linear Dynamical Systems

Point spread function reconstruction from the image of a sharp edge

ST 740: Markov Chain Monte Carlo

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Markov Chain Monte Carlo methods

Basic Sampling Methods

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Bayesian Phylogenetics

Part 4: Multi-parameter and normal models

Doing Bayesian Integrals

Advanced Statistical Methods. Lecture 6

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Markov chain Monte Carlo methods for visual tracking

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Cross entropy-based importance sampling using Gaussian densities revisited

Bayesian Gaussian Process Regression

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

CSC 2541: Bayesian Methods for Machine Learning

Bayesian Methods in Multilevel Regression

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

Preliminary statistics

Transcription:

BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models and the corresponding prior probability distribution over these models. he symbol M will be dropped whenever there is no confusion. One should always eep in mind that everything is conditioning on M. 3. X contains the uncertain model parameters in the assumed model class as well as other uncertain variables of interest; x is a possible value of X ; X is a sample of X. 4. u is the nown input to the system; uncertainties in the input should be modeled and parameterized by X ; u after the conditioning symbol will be dropped whenever there is no confusion. 5. Y is the uncertain response of the system and Ŷ is the observed response from the system, i.e. the data. 6. f. denotes probability density function;. function; P. denotes probability. F denotes cumulative density 7. All variables are vectors, unless particularly specified in the texts. 8. Cov. denotes the covariance matrix of uncertain variables; c.o.v. denotes the coefficient of variation. 9. denotes nown duration and Σ denotes covariance matrix. Probability model class and Bayesian computation A probability model class M specifies the prior PDF f x and the prior predictive PDF f y x. Example : Y = X + E, where f x N, = ; ~ N 0, E is the uncertain modeling error; X E. It is clear that the prior predictive PDF f y x = N x,. Consider we get a data Y =, the posterior PDF is then

= f y = f y x f x f x Y = f y = x f x 3 x 4 + 4+ + x x x x x x e e e e 3 = N, Example : A = g X u = X N,,.., ~, Y = A + E E i i d N E X ~... 0, Consider we get data Y = { Y,..., Y f x Y = = = f Y f y Y x f x Y g x, u x f y = Y x f x e e = = Example 3: X = g X u + W = X N W N,,.., 0 ~, ~ 0, Y = h X u + V V i i d N V W X, ~... 0, Consider we get data Y = { Y,..., Y 0,..., f x Y = f x x Y = 0 = = Y h x, u x g x, u x 8 e e e = = = 0 f Y f y Y x f x x f x = = f y = Y x f x x f x

Example 4: X = g X, u, Θ + W X ~ N, Σ W ~ N0, Σ Θ 0 0 0 W Y = h X, u, Θ + V V ~ i. i. d. N0, Σ Θ Θ ~ N µ, Σ V V W X Θ =,.., Consider we get data Y = { Y,..., Y 0,...,, θ f x Y = f x x Y = 0 = = = Y h x, u, θ V θ Y h x, u, θ Σ x g x, u, θ ΣW θ x g x, u, θ µ 0 f Y f y = Y x, θ f x x, θ f x f θ = = f y = Y x, θ f x x, θ f x f θ e = e e x0 µ 0 Σ0 x0 µ 0 θ µ Σ θ µ e Remars:. We always now how to evaluate the prior PDF f x and the predictive prior PDF f y x or the lielihood when the data is given because they are chosen by us when we setup the probability model class M. However, we usually don t now how to evaluate the posterior PDF f normalizing constant f Y.. he posterior PDF f x Y because there is unnown x Y is usually not the end story for Bayesian analysis. Instead, we are usually interested in estimating some function gx conditioning on the data, i.e. estimate E g X Y g x f x Y dx One way of estimating this quantity is to draw N samples from f according to the Law of Large Number: x Y and

N E g X Y g x f x Y dx g X i N i = Unfortunately, we usually cannot directly draw sample i.e. using Monte Carlo from f x Y. Most of the time, we need stochastic simulation techniques. 3. When the uncertain variable X does not change with time Examples and, we will solve the Bayesian problems using the bloc mode; when X changes with time Example 3, we ll solve them using the sequential mode. When part of X changes with time and the other part does not, we ll solve it by combining the bloc mode and sequential mode. Bloc mode: Monte Carlo, rejection sampling, importance sampling, Marov chain Monte Carlo Metropolis-Hastings, Gibbs sampler, hybrid Monte Carlo, transitional Marov chain Monte Carlo Sequential mode: Kalman filter, RS smoother, bacward sampler, particle filter, particle smoother Prior PDF selection:. Non-informative prior and Jeffrey s prior When we have no prior information about X, it is desirable to choose a prior PDF that does not provide any prior information. Such a prior PDF is called non-informative prior. Intuitively, the uniform PDF is non-informative; many people choose it for mathematical simplicity. But it can be seen that if X is uniformly distributed, expx is not. his contradicts with the intuition that if the prior PDF of X is non-informative, that of expx should also be non-informative. A better choice of non-informative prior is the Jeffrey s prior, which is proportional to x f x EY log f Y x. Conjugate prior In some rare cases, the posterior PDF will be of the same type of the prior PDF. When this happens, we say that the prior PDF is conjugate to the corresponding setting. Example : Y = ax + E, where E is Gaussian with nown mean and covariance matrix; a is a nown matrix. It can be shown that if X is also Gaussian with nown mean and covariance matrix plus X, E are jointly Gaussian, the posterior PDF

f x Y is also Gaussian. In fact, Gaussian prior PDF f linear models with Gaussian uncertainties. Example : x is conjugate to Y = E, where E ~ N 0, X I and X is the uncertain variance parameter. It can be shown that if X is inverse Gamma, the posterior PDF f x Y is also inverse Gamma. In fact, inverse Gamma prior PDF f x is conjugate to problems with uncertain variance. he choice of conjugate priors is usually motivated by its computational simplicity because the posterior PDF can be easily sampled. 3. Maximum entropy prior Under moment information about X, we can choose the least-informative prior maximizing the differential entropy subject to the moment constraints. his prior PDF can be found by solving the following optimization problem: max f log f x f x dx subject to f x dx =, g x f x dx = µ,... f x = e λ λ λ * 0 + g x + g x + g x f x dx = µ,