The Gaussian distribution

Size: px

Start display at page:

Download "The Gaussian distribution"

Gabriel Long
5 years ago
Views:

1 The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a < x < b) = px)dx a. It is non-negative for all real x. 3. The integral of the probability function is one, that is px)dx = Extending to the case of a vector x, we have non-negative px) with the following properties:

2 . The probability that x is inside a region R P = px)dx R. The integral of the probability function is one, that is px)dx = The Gaussian distribution: The most commonly used probability function is Gaussian function also known as Normal distribution) px) = N x µ, σ ) = πσ exp x µ) σ ) where µ is the mean, σ is the variance and σ is the standard deviation.

3 Figure: Gaussian pdf with µ =, σ =. We are also interested in Gaussian function defined over a D-dimensional vector x N x µ, Σ) = π) D/ Σ D/ exp x µ)t Σ x µ) where µ is called the mean vector, Σ is called covariance matrix positive definite). Σ is called the determinant of Σ. ) 3

4 Figure: D Gaussian pdf. Contours of a Gaussian are shown, where Σ is a) an identity matrix; b) diagonal form and c) general form a) b) c)

5 Maximum likehood for the Gaussian: Given a data set X = {x,...x N } in which x n are assumed to be drawn independently from a multivariate Gaussian distribution, we can estimate the parameters of the density by maximum likelihood. The log likelihood function is given by log px µ, Σ) = ND logπ) N N n= log Σ x n µ) T Σ x n µ) ) Setting the derivative of log px µ, Σ) with respect to the mean µ as zero, we have N n= Σ x n µ) = ) so that µ ML = N Nn= x n Setting the derivative of log px µ, Σ) with respect to the mean Σµ as zero, we will have Σ ML = N N n= x n µ ML )x n µ ML ) T 3) 5

6 Parzen windows Density estimation:given a set of n data samples x,..., x n, we can estimate the density function px), so that we can output px) for any new sample x. This is called density estimation. The basic ideas behind many of the methods of estimating an unknown probability density function are very simple. The most fundamental techniques rely on the fact that the probability P that a vector falls in a region R is given by P = px)dx R If we now assume that R is so small that px) does not vary much within it, we can write P = R px)dx px) dx = px)v where V is the volume of R. R 6

7 On the other hand, suppose that n samples x,..., x n are independently drawn according to the probability density function px), and there are k out of n samples falling within the region R, we have P = k/n Thus we arrive at the following obvious estimate for px), px) = k/n V Parzen window density estimation Consider that R is a hypercube centered at x think about a -D square). Let h be the length of the edge of the hypercube, then V = h for a -D square, and V = h 3 for a 3-D cube. 7

8 x h/, x +h/ ) h x +h/, x +h/ ) x x h/, x h/ ) x +h/, x h/ ) Introduce ϕ x i x h ) = { x ik x k h /, k =, otherwise which indicates whether x i is inside the square centered at x, width h) or not. The total number k samples falling within the region R, out of n, is given by k = n i= ϕ x i x h ) 8

9 The Parzen probability density estimation formula for -D) is given by ϕ x i x h px) = k/n V = n n i= h ϕx i x h ) ) is called a window function. We can generalize the idea and allow the use of other window functions so as to yield other Parzen window density estimation methods. For example, if Gaussian function is used, then for -D) we have px) = n n i= exp x i x) ) πσ σ This is simply the average of n Gaussian functions with each data point as a center. σ needs to be predetermined. 9

10 Example: Given a set of five data points x =, x =.5, x 3 = 3, x 4 = and x 5 = 6, find Parzen probability density function pdf) estimates at x = 3, using the Gaussian function with σ = as window function. Solution: exp x x) ) π = 3) exp π ) =.4 exp x x) ) π =.5 3) exp π ) =.35

11 so exp x 3 x) ) π exp x 4 x) ) π exp x 5 x) ) π =.3989 =.54 =.44 px = 3) = )/5 =.3 The Parzen window can be graphically illustrated next. Each data point makes an equal contribution to the final pdf denoted by the solid line.

12 px) x Figure above: The dotted lines are the Gaussian functions centered at 5 data points px) x Figure above: The Parzen window pdf function sums ups 5 dotted line.

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates: