Uncorrelatedness and Independence

Similar documents
x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

conditional cdf, conditional pdf, total probability theorem?

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Continuous Random Variables

16.584: Random Vectors

Multivariate Random Variable

ENGG2430A-Homework 2

Introduction to Probability and Stocastic Processes - Part I

Review (Probability & Linear Algebra)

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

01 Probability Theory and Statistics Review

3. Probability and Statistics

1 Random Variable: Topics

Multiple Random Variables

Lecture 2. Spring Quarter Statistical Optics. Lecture 2. Characteristic Functions. Transformation of RVs. Sums of RVs

Probability Theory Review Reading Assignments

Multivariate Distributions

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5 Operations on Multiple Random Variables

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

CIFAR Lectures: Non-Gaussian statistics and natural images

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Chapter 5 continued. Chapter 5 sections

The Multivariate Gaussian Distribution

A Probability Review

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Gaussian random variables inr n

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Lecture 2: Repetition of probability theory and statistics

More than one variable

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term

Chapter 4 continued. Chapter 4 sections

Multivariate Distributions

Chapter 5,6 Multiple RandomVariables

Elliptically Contoured Distributions

Appendix A : Introduction to Probability and stochastic processes

Lecture 2: Review of Probability

EE4601 Communication Systems

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Lecture 1: August 28

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

The Multivariate Normal Distribution. In this case according to our theorem

ECE 636: Systems identification

Review of probability

Lecture Note 1: Probability Theory and Statistics

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Bivariate Distributions. Discrete Bivariate Distribution Example

Probability Background

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

Gaussian Random Variables Why we Care

ECE Lecture #10 Overview

Stat 5101 Notes: Algorithms

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Lecture 3: Statistical sampling uncertainty

Algorithms for Uncertainty Quantification

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Multivariate Statistics

Multivariate Distribution Models

Definition of a Stochastic Process

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

BASICS OF PROBABILITY

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

Let X and Y denote two random variables. The joint distribution of these random

Visualizing the Multivariate Normal, Lecture 9

Statistics for scientists and engineers

p(z)

Gaussian, Markov and stationary processes

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Lecture 11. Probability Theory: an Overveiw

ECE Lecture #9 Part 2 Overview

Basic concepts of probability theory

Stat 5101 Lecture Notes

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Review of Probability Theory

Unsupervised Learning: Dimensionality Reduction

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Bayesian Decision Theory

Lesson 4: Stationary stochastic processes

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 7. Basic Probability Theory

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Gaussian vectors and central limit theorem

Chapter 4. Chapter 4 sections

Covariance and Correlation

2 (Statistics) Random variables

Math-Stat-491-Fall2014-Notes-I

Outline. Random Variables. Examples. Random Variable

Transcription:

Uncorrelatedness and Independence Uncorrelatedness:Two r.v. x and y are uncorrelated if C xy = E[(x m x )(y m y ) T ] = 0 or equivalently R xy = E[xy T ] = E[x]E[y T ] = m x m T y White random vector:this is defined to be a r.v. with zero mean and unit covariance (correlation) matrix. m x = 0,R x = C x = I Example: What will be the mean and covariance of the white r.v. under the orthogonal transform? Let T denote an orthogonal matrix (i.e. T T T = TT T = I).Such a matrix defines an orthogonal transform (i.e. a rotation of an coordinate system - preserves distances in the space). Thus, define y = Tx. 1

Hence, m y =... = 0 and C y =... = I Therefore the orthogonal transformation preserves whiteness. Example: Calculate the R x for x = As + n, where s is a random signal with correlation matrix R s and the noise vector n has zero mean and is uncorrelated with the signal. Independence:Two random variables x, y are statistically independent if p x,y (x,y) = p x (x)p y (y) i.e. if the joint pdf of (x,y) factors out into the product of their marginal probability distributions p x and p y. 2

From the definition of statistical independence it follows that E[g(x)h(y)] = E[g(x)]E[h(y)] where g,h are any absolutely integrable functions. Similarly for random vectors the definition of statistical independence reads and the property reads Properties: p x,y (x,y) = p x (x)p y (y) E[g(x)h(y)] = E[g(x)]E[h(y)] the statistical independence of two r.v. s implies their uncorrelatedness Independence is a stronger property than uncorrelatedness. Only for gaussian variables uncorrelatedness and independence coincide. 3

Example: Consider the discrete random vector from our example X \ Y 0 1 2 0 1/18 1/9 1/6 6/18 1 1/9 1/18 1/9 5/18 2 1/6 1/6 1/18 7/18 6/18 6/18 6/18 Are X,Y independent? To check, Let s construct a table which entries are products of the corresponding marginal probabilities of X, Y. X \ Y 0 1 2 0 6/18*6/18 6/18*6/18 6/18*6/18 1 6/18*5/18 6/18*5/18 6/18*5/18 2 6/18*7/18 6/18*7/18 6/18*7/18 Hence, X, Y are not independent. Are they uncorrelated? 4

E(XY) = XYp(X,Y) X X = 0 0 1/18+0 1 1/9+0 2 1/6 +1 0 1/9+1 1 1/18+1 2 1/9 +2 0 1/6+2 1 1/6+2 2 1/18 = 15/18 However, E(X)E(Y) = 19/18 hence X, Y are correlated. Central limit theorem (CLT) Classical probability is concerned with random variables and sequences of independent identically distributed (iid) r.v. s. A very important case - sequence of partial sums of iid r.v. s x k = k z i i=1 5

Consider the normalised variables y k = x k m xk σ xk where m xk and σ xk are the mean and variance of X k. Central limit theorem asserts that the distribution of y k converges to a normal distribution with k. Analogous formulation of the CLT holds in the case of random vectors. CLT justifies use of the gaussian variables for modelling random phenomena in practice sums of a relatively small number of r.v. s will show gaussianity even if individual components are not necessarily identical. 6

Conditional probability Conditional density:consider random vectors x,y with marginal pdf s p x (x) and p y (y), respectively and a joint pdf p x,y (x,y). Conditional density of x given y is defined as p x y (x y) = p x,y(x,y) p y (y) Similarly, conditional density of y given x is defined as p y x (y x) = p x,y(x,y) p x (x) The conditional probability distributions allow to address questions like, what is the probability density of a r.v. x given that a random vector y has a fixed value y 0. For statistically independent r.v. s the conditional densities equal the respective marginal densities. 7

Example: Consider the bivariate discrete random vector X \ Y 0 1 2 0 1/18 1/9 1/6 6/18 1 1/9 1/18 1/9 5/18 2 1/6 1/6 1/18 7/18 6/18 6/18 6/18 The conditional probability function of Y given X = 1 is Y X=1 0 1 2 1 1/9 5/18 1/18 5/18 1/9 5/18 Bayes Rule:From definitions of the conditional densities we can obtain the following alternative formulas for calculating the joint pdf p x,y (x,y) = p y x (y x)p x (x) = p x y (x y)p y (y) From the above follows so called Bayes rule for calculating the conditional density of y given x: 8

p y x (y x) = p x y (x y)p y(y) p x (x) where the denominator can be calculated by integration p x (x) = p x y (x η)p y(η)dη Bayes rule allows to compute the posterior density p y x (y x) given the observed vector x and either knowing or assuming the prior distribution p y (y). Conditional expectations E[g(x, y) y] = g(ξ,y)p x y (ξ y)dξ The conditional expectation is a random variable - it depends on the r.v. y. The following relationship holds E[g(x, y)] = E[E[g(x, y) y]] 9

The family of multivariate gaussian densities p x (x) = exp 1 (2π) n/2 (detc x ) 1/2 ( 1 ) 2 (x m x) T C 1 x (x m x ) where n is the dimension of x, m x is the mean and C x is the covariance matrix of x and is assumed to be strictly positive definite. Properties: m x and C x define uniquely the Gaussian pdf. closed under linear transforms - if x is a random gaussian vector then y = Ax is also gaussian with m y = Am x and C y = AC x A T marginal and conditionals are gaussian 10

uncorrelatedness and geometric structure: If the covariance matrix C x of the multidimensional gaussian density is not diagonal, then the components of x are not independent. C x is symmetric and positive definite matrix, hence it can be represented as C x = EDE T = n i=1 λ i e i e T i where E is an orthogonal matrix containing eigenvectors of C x as its columns and D = diag(λ 1,λ 2,...,λ n ) is a diagonal matrix containing the corresponding eigenvalues of C x. Transform u = E T (x m x ) rotates the data so that the components of u are uncorrelated and hence independent. 11

The cross-section of gaussian pdf with constant value of the density is a hyper-ellipsoid (x m x ) T C 1 x (x m x) = c centered at the mean, with axis parallel to the eigenvectors of C x and the eigenvalues being the corresponding variances. Higher-order Statistics Consider a scalar r.v. x with a probability density function p x (x). The j th moment of x is α j = E[x j ] = ξj p x (ξ)dξ and the j th central moment of x µ j = E[(x α 1 ) j ] = (ξ m x) j p x (ξ)dξ 12

Skewness and Kurtosis: The third central moment called skewness provides a measure of asymmetricity of the pdf. The fourth order statistics called kurtosis indicates nongaussianity of r.v. It is defined for zero-mean r.v. as kurt(x) = E[x 4 ] 3(E[x 2 ]) 2 Distribution with negative kurtosis are called subgaussian (usually flatter than Gaussian or multimodal). Distribution with positive kurtosis are called supergaussian (usually sharper peaked than Gaussian with longer tails). Properties of kurtosis: for 2 statistically independent r.v. x, y, kurt(x+y) = kurt(x)+kurt(y) for any scalar a: kurt(ax) = a 4 kurt(x) 13

Example: Laplacian density has a pdf p x (x) = λ 2 exp( λ x ) Example: Exponential family of pdf s (with zero mean) contains Gaussian, Laplacian and uniform pdf s as special cases: p x (x) = Cexp ( x ν νe[ x ν ] i.e. for ν = 2 the above pdf is equivalent to the Gaussian pdf ) ( ) p x (x) = Cexp ( x 2 2E[ x 2 = Cexp x2 ] 2σx 2 ν = 1 gives Laplacian pdf and ν yields uniform pdf. ) 14