x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Similar documents
Exam 2. Jeremy Morris. March 23, 2006

2.3. The Gaussian Distribution

The Multivariate Gaussian Distribution

(Multivariate) Gaussian (Normal) Probability Densities

01 Probability Theory and Statistics Review

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Elliptically Contoured Distributions

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

A Probability Review

Lecture Note 1: Probability Theory and Statistics

Chapter 5 continued. Chapter 5 sections

3. Probability and Statistics

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

The generative approach to classification. A classification problem. Generative models CSE 250B

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Chapter 5. Chapter 5 sections

Gaussian Models (9/9/13)

TAMS39 Lecture 2 Multivariate normal distribution

Lecture Notes on the Gaussian Distribution

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Uncorrelatedness and Independence

8 - Continuous random vectors

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Random Variables and Their Distributions

Multivariate Statistics

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Bayesian Decision Theory

Principal Components Theory Notes

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Elements of Probability Theory

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Introduction to Probability and Stocastic Processes - Part I

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Review (Probability & Linear Algebra)

Stat 206: Sampling theory, sample moments, mahalanobis

Lecture 2: Review of Basic Probability Theory

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

16.584: Random Vectors

Lecture 11. Multivariate Normal theory

Section 8.1. Vector Notation

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

The Multivariate Normal Distribution. In this case according to our theorem

MULTIVARIATE DISTRIBUTIONS

The Multivariate Gaussian Distribution [DRAFT]

MAS223 Statistical Inference and Modelling Exercises

The Multivariate Normal Distribution 1

Chapter 4 Multiple Random Variables

Bivariate distributions

Minimum Error Rate Classification

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

CSC411 Fall 2018 Homework 5

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

2. Matrix Algebra and Random Vectors

Probability Theory Review Reading Assignments

COM336: Neural Computing

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Probability Distributions

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

CPSC 540: Machine Learning

Lecture 2: Repetition of probability theory and statistics

Introduction to Machine Learning

BASICS OF PROBABILITY

Multiple Random Variables

18 Bivariate normal distribution I

Stat 5101 Notes: Brand Name Distributions

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Multivariate Random Variable

Motivating the Covariance Matrix

The Multivariate Normal Distribution 1

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Introduction to Normal Distribution

Review: mostly probability and some statistics

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Continuous Random Variables

Machine learning - HT Maximum Likelihood

Probability Distributions

B4 Estimation and Inference

Gaussian random variables inr n

Let X and Y denote two random variables. The joint distribution of these random

The Multivariate Normal Distribution

Bayesian Decision and Bayesian Learning

5. Discriminant analysis

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Basic Concepts in Matrix Algebra

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Chris Bishop s PRML Ch. 8: Graphical Models

Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

1 Data Arrays and Decompositions

Review of Probability Theory

Lecture 14: Multivariate mgf s and chf s

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Transcription:

.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics is the Gaussian distribution, also called the normal distribution. The Gaussian distribution arises in many contexts and is widely used for modeling continuous random variables. The probability density function of the univariate (one-dimensional) Gaussian distribution is p(x µ, σ ) = N (x; µ, σ ) = 1 ( ) Z exp (x µ) σ. The normalization constant Z is Z = πσ. The parameters µ and σ specify the mean and variance of the distribution, respectively: µ = E[x]; σ = var[x]. Figure 1 plots the probability density function for several sets of parameters (µ, σ ). The distribution is symmetric around the mean and most of the density ( 99.7%) is contained within ±3σ of the mean. We may extend the univariate Gaussian distribution to a distribution over d-dimensional vectors, producing a multivariate analog. The probablity density function of the multivariate Gaussian distribution is p(x µ, Σ) = N (x; µ, Σ) = 1 ( Z exp 1 ) (x µ) Σ 1 (x µ). The normalization constant Z is Z = det(πσ) = (π) d / (det Σ) 1 /. 1

x x x (a) Σ = [ ] 1 1 (b) Σ = [ 1 1/ 1/ 1 ] (c) Σ = [ 1 ] 1 1 3 Figure : Contour plots for example bivariate Gaussian distributions. Here µ = for all examples. Examining these equations, we can see that the multivariate density coincides with the univariate density in the special case when Σ is the scalar σ. Again, the vector µ specifies the mean of the multivariate Gaussian distribution. The matrix Σ specifies the covariance between each pair of variables in x: Σ = cov(x, x) = E [ (x µ)(x µ) ]. Covariance matrices are necessarily symmetric and positive semidefinite, which means their eigenvalues are nonnegative. Note that the density function above requires that Σ be positive definite, or have strictly positive eigenvalues. A zero eigenvalue would result in a determinant of zero, making the normalization impossible. The dependence of the multivariate Gaussian density on x is entirely through the value of the quadratic form = (x µ) Σ 1 (x µ). The value (obtained via a square root) is called the Mahalanobis distance, and can be seen as a generalization of the Z score (x µ) σ, often encountered in statistics. To understand the behavior of the density geometrically, we can set the Mahalanobis distance to a constant. The set of points in R d satisfying = c for any given value c > is an ellipsoid with the eigenvectors of Σ defining the directions of the principal axes. Figure shows contour plots of the density of three bivariate (two-dimensional) Gaussian distributions. The elliptical shape of the contours is clear. The Gaussian distribution has a number of convenient analytic properties, some of which we describe below. Marginalization Often we will have a set of variables x with a joint multivariate Gaussian distribution, but only be interested in reasoning about a subset of these variables. Suppose x has a multivariate Gaussian distribution: p(x µ, Σ) = N (x, µ, Σ).

..3 p() x..1 (a) p(x µ, Σ) (b) p( µ 1, Σ 11) = N (;, 1) Figure 3: Marginalization example. (a) shows the joint density over x = [, x ] ; this is the same density as in Figure (c). (b) shows the marginal density of. Let us partition the vector into two components: [ ] x1 x =. x We partition the mean vector and covariance matrix in the same way: [ ] [ ] µ1 Σ11 Σ µ = Σ = 1. µ Σ 1 Σ Now the marginal distribution of the subvector has a simple form: p( µ, Σ) = N (, µ 1, Σ 11 ), so we simply pick out the entries of µ and Σ corresponding to. Figure 3 illustrates the marginal distribution of for the joint distribution shown in Figure (c). Conditioning Another common scenario will be when we have a set of variables x with a joint multivariate Gaussian prior distribution, and are then told the value of a subset of these variables. We may then condition our prior distribution on this observation, giving a posterior distribution over the remaining variables. Suppose again that x has a multivariate Gaussian distribution: p(x µ, Σ) = N (x, µ, Σ), and that we have partitioned as before: x = [, x ]. Suppose now that we learn the exact value of the subvector x. Remarkably, the posterior distribution p( x, µ, Σ) 3

.6. p( x = ) p() x. (a) p(x µ, Σ) (b) p( x, µ, Σ) = N ( ; /3, ( /3) ) Figure : Conditioning example. (a) shows the joint density over x = [, x ], along with the observation value x = ; this is the same density as in Figure (c). (b) shows the conditional density of given x =. is a Gaussian distribution! The formula is with p( x, µ, Σ) = N ( ; µ 1, Σ 11 ), µ 1 = µ 1 + Σ 1 Σ 1 (x µ ); Σ 11 = Σ 11 Σ 1 Σ 1 Σ 1. So we adjust the mean by an amount dependent on: (1) the covariance between and x, Σ 1, () the prior uncertainty in x, Σ, and (3) the deviation of the observation from the prior mean, (x µ ). Similarly, we reduce the uncertainty in, Σ 11, by an amount dependent on (1) and (). Notably, the reduction of the covariance matrix does not depend on the values we observe. Notice that if and x are independent, then Σ 1 =, and the conditioning operation does not change the distribution of, as expected. Figure illustrates the conditional distribution of for the joint distribution shown in Figure (c), after observing x =. Pointwise multiplication Another remarkable fact about multivariate Gaussian density functions is that pointwise multiplication gives another (unnormalized) Gaussian pdf: where N (x; µ, Σ) N (x; ν, P) = 1 N (x; ω, T), Z T = (Σ 1 + P 1 ) 1 ω = T(Σ 1 µ + P 1 ν) Z 1 = N (µ; ν, Σ + P) = N (ν; µ, Σ + P).

x x (a) p(x µ, Σ) (b) p(y µ, Σ, A, b) Figure 5: Affine transformation example. (a) shows the joint density over x = [, x ] ; this is the same density as in Figure (c). (b) shows the density of y = Ax + b. The values of A and b are given in the text. The density of the transformed vector is another Gaussian. Convolutions Gaussian probability density functions are closed under convolutions. Let x and y be d-dimensional vectors, with distributions p(x µ, Σ) = N (x; µ, Σ); p(y ν, P) = N (y; ν, P). Then the convolution of their density functions is another Gaussian pdf: f(y) = N (y x; ν, P) N (x; µ, Σ) dx = N (y; µ + ν, Σ + P), where the mean and covariances add in the result. If we assume that x and y are independent, then the distribution of their sum z = x + y will also have a multivariate Gaussian distribution, whose density will precisely the convolution of the individual densities: p(z µ, ν, Σ, P) = N (z; µ + ν, Σ + P). These results will often come in handy. Affine transformations Consider a d-dimensional vector x with a multivariate Gaussian distribution: p(x µ, Σ) = N (x, µ, Σ). Suppose we wish to reason about an affine transformation of x into R D, y = Ax + b, where A R D d and b R D. Then y has a D-dimensional Gaussian distribution: p(y µ, Σ, A, b) = N (y, Aµ + b, AΣA ). 5

Figure 5 illustrates an affine transformation of the vector x with the joint distribution shown in Figure (c), for the values [ ] [ 1/5 A = 3 /5 1 ; b =. 1] 1/ 3/1 The density has been rotated and translated, but remains a Gaussian. Selecting parameters The d-dimensional multivariate Gaussian distribution is specified by the parameters µ and Σ. Without any further restrictions, specifying µ requires d parameters and specifying Σ requires a further ( ) d = d(d 1). The number of parameters therefore grows quadratically in the dimension, which can sometimes cause difficulty. For this reason, we sometimes restrict the covariance matrix Σ in some way to reduce the number of parameters. Common choices are to set Σ = diag τ, where τ is a vector of marginal variances, and Σ = σ I, a constant diagonal matrix. Both of these options assume independence between the variables in x. The former case is more flexible, allowing a different scale parameter for each entry, whereas the latter assumes an equal marginal variance of σ for each variable. Geometrically, the densities are axis-aligned, as in Figure (a), and in the latter case, the isoprobability contours are spherical (also as in Figure (a)). 6