Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Similar documents
MAT Inverse Problems, Part 2: Statistical Inversion

Inverse Problems in the Bayesian Framework

Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

The problem is to infer on the underlying probability distribution that gives rise to the data S.

Introduction to Bayesian methods in inverse problems

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Chapter 5 continued. Chapter 5 sections

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Recursive Estimation

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Frequentist-Bayesian Model Comparisons: A Simple Example

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Multivariate Distributions

Partial factor modeling: predictor-dependent shrinkage for linear regression

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

A Matrix Theoretic Derivation of the Kalman Filter

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

CS 195-5: Machine Learning Problem Set 1

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Bayesian Decision Theory

Probability and statistics; Rehearsal for pattern recognition

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

The Multivariate Gaussian Distribution

Data Mining and Analysis: Fundamental Concepts and Algorithms

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

A Bayesian Treatment of Linear Gaussian Regression

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Outline Lecture 2 2(32)

Markov Chain Monte Carlo methods

A short introduction to INLA and R-INLA

Introduction to Probabilistic Graphical Models

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Covariance Matrix Simplification For Efficient Uncertainty Management

CPSC 540: Machine Learning

Hidden Markov Models and Gaussian Mixture Models

6.3 Forecasting ARMA processes

CSE 559A: Computer Vision

2D Image Processing. Bayes filter implementation: Kalman filter

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

1.4 Properties of the autocovariance for stationary time-series

CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Outline lecture 2 2(30)

Naïve Bayes classification

Hierarchical Modeling for Univariate Spatial Data

A Study of Covariances within Basic and Extended Kalman Filters

Sparse Linear Models (10/7/13)

Inverse problem and optimization

Gaussian Process Regression

Bayesian Inference for the Multivariate Normal

Minimum Message Length Analysis of the Behrens Fisher Problem

ECE Homework Set 3

Independent Component Analysis. PhD Seminar Jörgen Ungh

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

STA 4273H: Statistical Machine Learning

Lecture 8: Signal Detection and Noise Assumption

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Generalized Rejection Sampling Schemes and Applications in Signal Processing

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayesian inverse problems with Laplacian noise

Variational Bayesian Logistic Regression

Chapter 5. Chapter 5 sections

Gaussian with mean ( µ ) and standard deviation ( σ)

Machine Learning for Signal Processing Bayes Classification

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

General Bayesian Inference I

ECE521 week 3: 23/26 January 2017

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Markov chain Monte Carlo methods in atmospheric remote sensing

Model Selection for Gaussian Processes

Density Estimation: ML, MAP, Bayesian estimation

Bayesian Linear Regression [DRAFT - In Progress]

Introduction to Machine Learning

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Gaussian, Markov and stationary processes

Factor Analysis (10/2/13)

6.867 Machine Learning

Stat 5101 Notes: Brand Name Distributions

Basic Concepts in Matrix Algebra

Tutorial on Blind Source Separation and Independent Component Analysis

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

13 : Variational Inference: Loopy Belief Propagation and Mean Field

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Transcription:

Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be formed. In other words, all the other variables but those of primary interest are integrated out from the posterior density.

Marginal density Example 2 Matlab) The goal is, like in Example 1, to locate an electric point source within a unit disk D centered at origin using sensors lying on the boundary. In this case, the charge q of the source is modeled is assumed to be a Gaussian random variable with mean 1 and standard deviation ν and the voltage expericenced by i-th sensor is of the form y i = q/d i. Find and visualize the marginal posterior πx y) of the location x. Use the formula expcx 1 2π c 2 bx 2 2 ) ) dx = b exp, b > 0. 2b

Marginal density Example 2 Matlab) continued Solution The likelihood follows from the likelihood of Example 1, by substituting 1/d i with q/d i yielding πy x, q) exp 1 2σ 2 The marginal density is given by πx y) = πq)πx y, q)dq exp n y i q/d i ) 2). i=1 1 2ν 2 q 1)2) exp 1 2σ 2 n y i q/d i ) 2) dq i=1

Marginal density Example 2 = = exp 1 2ν 2 q 1)2 1 2σ 2 exp 1 2ν 2 + n 1 n exp ν 2 + i=1 i=1 n y i q/d i ) 2) dq i=1 1 ) q 2 2σ 2 di 2 + 1 n ν 2 + i=1 y i ) 1 q σ 2 d i 2 1 ν 2 + n i=1 1 σ 2 d 2 i y i ) ) q + C σ 2 dq d i ) q 2 ) dq If b = 1/ν 2 + n i=1 1/σ 2 di 2) and c = 1/ν2 + n i=1 y i/σ 2 d i ), it follows from expcx 1 2 bx 2 ) dx = 2π/b expc 2 /2b)) that the marginal density is of the form

Marginal density Example 2 1/ν 2π exp 2 + n i=1 y i /σ 2 d i ) 2 1/ν πx y) 2 + ) ) n i=1 1/σ 2 di 2) 1/ν 2 + n i=1 1/σ 2 di 2) In the following visualizations, the exact particle location was x = r, φ) = 0.5, 0.5) and the charge was chosen to be q = 0.5. Prior and likelihood standard deviations were given the values ν = 0.1, 1 and σ = 0.1, 0.2. The results show that ν = 0.1 is a rather low prior standard deviation, since with that value the marginal posterior density is not peaked where the particle is located. The difference between the prior mean q = 1 and exact value q = 0.5 is also large as compared to the choice ν = 0.1. Value ν = 1, on the otherhand, leads to more spread results. Again, the likelihood variance σ = 0.2 leads to more spread densities than σ = 0.1. ) 2

Marginal density Example 2 Matlab) continued Solution n=3, σ=0.1, ν=0.1 n=4, σ=0.1, ν=0.1 n=5, σ=0.1, ν=0.1 n=3, σ=0.2, ν=0.1 n=4, σ=0.2, ν=0.1 n=5, σ=0.2, ν=0.1 Particle and sensor locations are indicated by the purple and red circles, respectively.

Marginal density Example 2 Matlab) continued Solution n=3, σ=0.1, ν=1 n=4, σ=0.1, ν=1 n=5, σ=0.1, ν=1 n=3, σ=0.2, ν=1 n=4, σ=0.2, ν=1 n=5, σ=0.2, ν=1 Particle and sensor locations are indicated by the purple and red circles, respectively.

Estimates Estimates are often necessary in order to get a concept of possible realizations of X. One of the most popular statistical estimates is the maximum a posteriori estimate MAP), which maximizes the posterior density, i.e. x MAP = arg max πx y), x Rn Another common point estimate is the conditional mean CM) of the unknown X defined as x CM = E{x y} = xπx y)dx. R n The task of finding MAP or CM constitutes an optimization or integration problem, respectively.

MAP vs. CM estimates MAP is the global) maximizer of the posterior and CM is the center of posterior probability mass. CM is considered to be, in general, more robust than MAP, as the maximizer point estimate) of a posterior density can be, for example, more sensitive to noise small changes) in the data than the center of probability mass integral estimate).

Estimates If X is a Gaussian random variable, then MAP coincides with CM. A typical spread estimator is the conditional covariance covx y) R n n, defined as covx y) = x x CM )x x CM ) T πx y) dx. R n A Bayesian credibility set D p including p% of the posterior probability mass can be estimated through the integral µd p y) = πx y)dx = D p p 100, πx y) x D p = constant.

Estimates Example 3 Given a forward model Y = A X + N, where A R m n is a constant matrix and N is a Gaussian distributed zero mean EN) = 0) noise vector with a diagonal covariance matrix C = σi, find a) the likelihood πy x), b) the posterior density πx y) corresponding to the Gaussian prior πx) exp 1 ) 2α 2 x T x, c) the maximizer of the posterior MAP).

Estimates Example 3 continued Solution a) The distribution of N = Y AX is zero mean Gaussian with the diagonal covariance martrix C = σi meaning that πy x) = πn) exp 1 ) 2σ 2 y Ax)T y Ax). b) The posterior density is given by πx y) = πx)πy x) exp exp 1 2α 2 x T x ) exp 1 ) 2σ 2 y Ax)T y Ax) ). 1 2α 2 x T x 1 2σ 2 y Ax)T y Ax)

Estimates Example 3 continued Solution c) Maximizer of the posterior density, i.e. x MAP, minimizes the argument of the exponential function, meaning that 1 x MAP = arg min 2α 2 x T x + 1 ) 2σ 2 y Ax)T y Ax). The derivative of the quadratic form needs to be zero, that is 1 σ 2 AT Ax MAP + 1 α 2 x MAP 1 σ 2 AT y = 0. This is equivalent to x MAP = [A T A + σ 2 /α 2 )I ] 1 A T y, that is the Tikhonov regularized solution of Ax = y with the regularization parameter σ 2 /α 2, i.e. the likelihood variance σ 2 divided by the prior variance α 2.

Gaussian priors A Gaussian n-variate random variable X with mean x R n and symmetric and positive definite) covariance matrix Γ R n n is denoted by X Nx, Γ). The probability density of X is given by πx) = 1 2π) n detγ) exp 1 ) n 2 x x)t Γ 1 x x). When Gaussian desity is used as a prior, structural prior information of the unknown x can be encoded into the covariance matrix Γ. Due to the positive definiteness there exists a factorization of the form Γ 1 = W T W, in which W is invertible and can be, for example, upper) triangular Cholesky factor W = U = L T ).

Gaussian priors The matrix W is called a whitening matrix, since Z = W X x) is Gaussian white noise: it has zero-mean and identity covariance matrix Z N0, I ). A random vector, whose components are mutually independent and identically distributed, is called white noise.) This can be verified through a straightforward calculation as follows: πx) exp 1 ) 2 x x)t Γ 1 x x) = exp 1 ) 2 x x)t W T W x x) = exp 1 ) 2 zt z πz). Hence, a realization x can be obtained by first drawing a realization z and, after that, applying the formula x = W 1 z + x.

Gaussian priors Example 4 Matlab) Assume that Z is white noise Z N0, I )) random vector corresponding to a 64 64 pixel image. Visualize a realization of X N0, Γ) with Γ 1 = W T W using the formula x = W 1 z in the following four cases: a) W = I, i.e. x is white noise, b) W is proportional to a discrete approximation of the Laplace operator = 2 1 + 2 2, c) W is otherwise as in b) but correlation between pixels close to the center of the image is higher, d) W is proportional to a discrete approximation of the directional differential operator d = d 1 1 + d 2 2 with d = d 1, d 2 ) = 1, 1).

Gaussian priors Example 4 Matlab) continued Solution a) White noise can be generated with a standard Gaussian random number generator randn in Matlab). b) W was formed as the standard finite difference approximation of the Laplace operator, i.e. w ki,j,k i,j = 4, w ki,j,k i+1,j = w ki,j,k i 1,j = w ki,j,k i,j+1 = w ki,j,k i,j 1 = 1 w ki,j,k l,n = 0, if i l > 1 or j n > 1. Here, k i,j is the vector index corresponding to pixel i, j).

Gaussian priors Example 4 Matlab) continued c) W was otherwise same as in b), but 3 was added to all elements w ki,j,k l,n if the centers of pixels i, j) and l, n) were both closer than the distance of 10 pixel side-lengths to the center of the image. d) W corresponding to a differential operator to a given direction d was defined as W = W 1) cosφ) + W 2) sinφ) where φ is the angle between d and positive X axis, w 1) k i,j,k i,j = w 1) k i,j,k i,j+1 = 1, w 2) k i,j,k i,j = w 1) k i,j,k i 1,j = 1 and otherwise w 1) = w 2) = 0. Direction d corresponded to a line with slope one, meaning that φ = π/4.

Gaussian priors Example 4 Matlab) continued a) b) c) d)