Gaussian random variables inr n

Similar documents
The Multivariate Gaussian Distribution

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Continuous Random Variables

Joint Distribution of Two or More Random Variables

Lecture Note 1: Probability Theory and Statistics

Principal Components Theory Notes

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Let X and Y denote two random variables. The joint distribution of these random

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Bivariate distributions

Review (Probability & Linear Algebra)

01 Probability Theory and Statistics Review

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

ENGG2430A-Homework 2

Jointly Distributed Random Variables

Review: mostly probability and some statistics

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Appendix A : Introduction to Probability and stochastic processes

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Multivariate Random Variable

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Algorithms for Uncertainty Quantification

The Multivariate Normal Distribution. In this case according to our theorem

Chapter 4. Continuous Random Variables

Basic Concepts in Matrix Algebra

III - MULTIVARIATE RANDOM VARIABLES

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

16.584: Random Vectors

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Lecture 2: Repetition of probability theory and statistics

3. Probability and Statistics

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Vectors and Matrices Statistics with Vectors and Matrices

MAS113 Introduction to Probability and Statistics. Proofs of theorems

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Lecture 11. Multivariate Normal theory

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

18 Bivariate normal distribution I

Review of Probability. CS1538: Introduction to Simulations

3 Multiple Discrete Random Variables

BASICS OF PROBABILITY

EE4601 Communication Systems

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Joint probability distributions: Discrete Variables. Two Discrete Random Variables. Example 1. Example 1

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

ACM 116: Lectures 3 4

UNIT-2: MULTIPLE RANDOM VARIABLES & OPERATIONS

Multivariate Distributions

Bayesian statistics, simulation and software

Lecture 2: Linear Algebra Review

Chapter 5 continued. Chapter 5 sections

CS70: Lecture 33. Linear Regression. 1. Examples 2. History 3. Multiple Random variables 4. Linear Regression 5. Derivation 6.

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

A Probability Review

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Notes for Math 324, Part 19

Multiple Random Variables

STAT 430/510: Lecture 16

Random Variables and Their Distributions

CS70: Jean Walrand: Lecture 22.

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Announcements (repeat) Principal Components Analysis

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Bivariate Distributions

Introduction to Statistical Inference Self-study

Multivariate probability distributions and linear regression

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Joint Probability Distributions and Random Samples (Devore Chapter Five)

MULTIVARIATE PROBABILITY DISTRIBUTIONS

component risk analysis

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

2 (Statistics) Random variables

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

ECON Fundamentals of Probability

1 Probability theory. 2 Random variables and probability theory.

Unsupervised Learning: Dimensionality Reduction

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

5 Operations on Multiple Random Variables

Chapter 4 continued. Chapter 4 sections

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Expectation of Random Variables

Probability Background

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

1 Inner Product and Orthogonality

conditional cdf, conditional pdf, total probability theorem?

Notes on Random Vectors and Multivariate Normal

Chapter 2. Continuous random variables

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Exercises * on Principal Component Analysis

Multivariate Distributions

Mathematical foundations - linear algebra

B4 Estimation and Inference

Transcription:

Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b is again Gaussian. Precisely Proof since The function EaX b ax b t ax b dx dt/a ax b Na b,a. x t x exp a ax b a b a. dx t a b exp a t a b exp a a is the density of a Gaussian Na b,a. The proof is complete. Corollary If X N,, then Z : X N0,. In other words, any Gaussian r.v. X N, can be represented in the form where Z is canonical (standard). X Z Remark For any random variable X (not necessarily Gaussian), the transformation X Z : is called standardization. The r.v. Z has always mean zero and standard deviation. However, if X belongs to some class (ex. Weibull), Z does not necessarily belong to the same class. Unless X is Gaussian. This is one of the reasons why the Gaussian part of probability theory is called the linear theory (invariance by linear, or even affine, transformations). Exercise (Theoretical) Show that Weibull class is not invariant. The general property of the previous remark is based on the linearity of the expectation: dt

and the quadratic property of the variance: EaX byc aex bey c VaraX b a VarX which hold true for all random variables and constants. See Appunti teorici terza parte, section. Multidimensional case We give the definition of multidimensional Gaussian variable reversing the previous procedure. Definition Canonical (standard) gaussian density inr n : or in vector notations: fx,...,x n x e... e fx x exp n/ where x is the Euclidean norm of x x,...,x n. xn x...x n e n/ A random vector Z Z,...,Z n with density fx will be called a canonical Gaussian vector. A picture of the canonical Gaussian density in dimension n was given in the first lecture. A sample of 00 points from a -D Gaussian is: z. - - 0 - - 0 3 z. Definition General Gaussian random vector X X,...,X n: any random vector of the form X AZ where Z Z,...,Z k is a canonical Gaussian vector inr k, for some k, A is a matrix with k input (columns) and n output (rows), and is a n-vector. In plain words: Gaussian vectors: linear (affine) transformations of canonical Gaussian vectors. translation. A: several possibilities: rotation, stretching in some direction... It plays the role of ( large A means large dispersion), but it is multidimensional. Let us see a few -D examples: translation by, multiplication by 0 0

multiplication by 0 0 followed by 45 rotation, namely multiplication by A 0 0 / / x[, ] -4-3 - - 0 3-4 - 0 x[, ] Proposition Let Q AA T (n n square, symmetric, matrix). If detq 0, then the density of X is fx n/ Level curves of the density: fx C They are ellipsoids. Covariance matrix detq exp x T Q x x T Q x R. More on independence Recall that two events A and B are called independent if PA B PAPB (more or less equivalently, if PA B PA and PB A PB). Two random variables X, Y are called independent if PX I,Y J PX IPY J for every interval I,J. If they have a densities f Xx, f Yy (called marginals), and joint density fx,y, then the identity is equivalent to independence of X, Y. fx,y f Xx f Yy Remark Z Z,...,Z n canonical Gaussian vectorz,...,z n independent -d Gaussian standard. Proposition If X, Y are independent, then EXY EXEY..

This is not a characterization of independence: it may happen that EXY EXEY but X, Y are not independent (the average is only a summary of the density, so a propriety of product of averages does not imply product of densities). However, such examples must be cooked with intention, they do not happen at random. Moreover: Proposition If X, Y are jointly gaussian and they are independent. EXY EXEY (a posteriori of thi lecture we could prove this claim). Definition Given two random variables X,Y, we call covariance the number CovX,Y EX EXYEY EXY EXEY. It is a generalization of the Variance: CovX,X VarX. We see that: CovX,Y 0 EXY EXEY. Definition We say that X and Y are uncorrelated if CovX,Y 0, or equivalently if EXY EXEY. Corollary Independent implies uncorrelated. Uncorrelated and jointly gaussian implies independent. The number CovX,Y gives a measure of the relation between two random variables. More closely we could see that it describes the degree of linear relation (regression theory). Large CovX,Y correspondes to high degree of linear correlation. A drawback of CovX,Y is that it depends on the unit of measure of X and Y: large CovX,Y is relative to the order of magnitude of the other quantities of the problem. The correlation coefficient X,Y CovX,Y X Y is independent of the unit of measure (it is absolute ), and X,Y. Again, X,Y 0 means uncorrelated. High degree of correlation becomesx,y close to or (positive or negative linear correlation). Proposition In general, VarX Y VarX VarY CovX,Y. Hence, if X,Y are uncorrelated, then VarX Y VarX VarY. This is not linearity of the variance. The first identity comes simply from the property a b a b ab. Proposition Cov is linear in both arguments: CovaX bx c,y acovx,y bcovx,y

and similarly in the second argument (it is symmetric). Notice that additive constants c disappear (as in the variance). (proof: elementary) What is Q AA T Let us understand better Q AA T. Write X AZin components: and compute X A Z A Z... X A Z A Z...... CovX,X In general, Cov i A i A j CovZ i,z j i,j A i Z i, A j Z j j A i A i AA T, Q, i CovX h,x k AA T h,k Q h,k. Proposition Q AA T is the covariance matrix (matrix of covariances). Covariance is generalization of variance. Q is generalization of from one-dimensional to multi-dimensional. Example For the example we have Q / / A The covariance between X and X is 3. / / / / Spectral theorem Any symmetric matrix, hence Q in particular, can be diagonalized: there exists a new orthonormal basis ofr n where Q is diagonal. The elements of such basis are eigenvectors of Q, the elements of Q on the diagonal are the corresponding eigenvalues: Qv i i v i 5 3 3 5.

Q 0 0 0... 0 0 0 n in the basis v,...,v n. The use is to order the eigenvalues in decreasing order. Example For the example A the eigenvectors are v / / and v, Q The covarance matrix Q AA T is also positive semi-definite: for all vectors x R n. This is equivalent to for i,...,n. Moreover, detq 0. We have x T Qx 0 i 0 5 3 3 5, with eigenvalues 4,. detq 0 i 0 for all i,...,n. In such a case, the level curves have the form y... y n n R where y,...,y n are the coordinates in the new basis v,...,v n. They are ellipses with axes v,...,v n and amplitudes along these axes equal to,..., n. The method of Principal Component Analysis (PCA) will be based on these remarks. Example For our usual example, since v, v, 4,.5 y 0.5 -.5 - -0.5 0 0.5 x.5-0.5 - the ellipses have the form: the equation x T Q x R, x x,y T, namely -.5 which can be obtained also from

Q 0.65 0.375 0.375 0.65 (R ). 0.65x 0.65y 0.375 xy Generation of multivariate samples How to generate Gaussian samples with given covariance? In many applications Q is known, but A is not. We want to generate a sample under X AZ. Problem: Q A? We have to solve the equation (A is the unknown) AA T Q. The software R gives us the following solution: require(mgcv) A-mroot(Q) We may choose the dimension k of Z. Simplest choice: k n dimension of X. Thus A is a square matrix. We may choose A symmetric. Thus the equation is The solution is In practice? A Q. A Q. Exercise Assume to know the spectral decomposition of Q, namely the eigenvectors v i and the eigevalues i. Let U be the orthogonal matrix (U T U ) defined as follows: the i-th column of U is v i. Check that Q : U T QU is diagonal, with diagonal elements i. The matrix matrix with elements i. Then set Check that A is symmetric and A Q. A : U Q U T. Generation of non-gaussian samples Recall the theorem: Q is simply the diagonal Theorem i) If Y is a random variable with cdf F (continuous case), then the random variable is uniformly distributed on 0,. U : FY

ii) If U is a uniform random variable then is a random variable with cdf F. Application of both (i) and (ii) gives us: F U Corollary If Y is a random variable with cdf F and denotes the cdf of a standard normal, then Z : FY is a standard normal variable. And vice-versa, Y F Z. Algorithm to generate a sample from Y: generate a sample from a standard normal variable Z compute F Z. Nothing more than the old one based on uniform? No: multidimensional, correlated! Let Y, Y two r.v. with cdf F, F (continuous case). Compute X : F Y, X : F Y. They are standard normal, but not necessarily independent. Theoretical gap: no reason why X,X should be jointly gaussian (gaussian vector). Assume X,X jointly gaussian. Compute covariance matrix Q of X,X, and average,. Compute A Q as above (or any other solution of AA T Q). Simulate standard Gaussian vector Z,Z. Compute X,X fromz,z by means of A and. Anti-transform Y i F i X i, i,. This is a way to generate samples from non-gaussian correlated r.v. Y,Y. Multidimensional data fit We describe only simple rules. Assume a sample x,y,...,x n,y n is given. We cannot plot a joint histogram or cdf. Thus we cannot get a feeling about gaussianity or not. But we can plot -D marginals. A Gaussian vector has Gaussian marginals. If we want to model our data by a -D Gaussian (either because we see a good agreement with gaussinaity of the marginals, or because of simplicity), we estimate Q and simply by cov and mean, in R. Otherwise, if we want to describe marginals by non-gaussian distributions, we fit the marginals and find F, F, transform the data by x i F x i, y i F y i, i,...,n into a new sample x,y,...,x n,y n assume it is jointly gaussian (we only know that x,...,x n and y,...,y n are gaussian) compute cov and mean of x,y,...,x n,y n. This is the model, that we may use for simulation, computation of probabilities and

other purposes. Example Consider the following 0 points in the plane -0.5 0.0 0.5.0.5.0.5 y -0.5 0.0 0.5.0.5.0.5 3.0 x They have been produced artificially by two independent N, components. Let us ignore this fact. As an exercise, let us think they are the values of two physical quantities measured in 0 experiments. We want to solve the following problem: compute the probability that both components are positive. A simple answer is: we count the number of point with positive components, 3 in this example, and answer 3 0.65. We clearly see that a number of points are close to the 0 boundary, thus the result suffers very much the peculiarity of the sample. We are sure that, if we repeat the experiments, this number may change considerably. Thus let us extract a model, a -D density, from data and compute the theoretical probability from it. We hope it is a more stabel result. For simplicity, let us choose a Gaussian fit from the beginning. Compute cov and mean of data, that in our case are: Q.00 0.058 0.058 0.798,.46 0.746 We see that in this example the first component is fitted quite well with respect to the true N0, which generated the sample. The second is not: the second sample is poor. The correlation between the two samples is very small, good indication of independence. The (gaussian) model has been found. How to compute the required probability? By Monte Carlo. Using require(mgcv), A-mroot(Q), get A. Then produce N standard points z z,z, transform them by Az, yy - 0 4-0 4 xx compute the fraction with both positive components. This is a Monte Carlo approzimation of the required probability. At the end we find p 0.69.

It is not very different from 3 0.65. But if we repeat a few times the whole procedure 0 we see that the second estimate is more stable than the first one (not so much, however, only roughly 0% better).