Multivariate Statistics

Similar documents
A Probability Review

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Applied Multivariate and Longitudinal Data Analysis

Multivariate Distributions

Review (Probability & Linear Algebra)

Multivariate Statistics

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Multiple Random Variables

Multivariate Statistics

Multivariate Analysis

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

MULTIVARIATE DISTRIBUTIONS

Notes on the Multivariate Normal and Related Topics

Data Mining and Analysis: Fundamental Concepts and Algorithms

[y i α βx i ] 2 (2) Q = i=1

Multivariate Random Variable

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Stat 5101 Lecture Notes

The Delta Method and Applications

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Review of Statistics

Multivariate Distribution Models

Notes on Random Vectors and Multivariate Normal

Lecture 3. Inference about multivariate normal distribution

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Introduction to Probability and Stocastic Processes - Part I

A simple graphical method to explore tail-dependence in stock-return pairs

Introduction to Maximum Likelihood Estimation

Multivariate Statistics

2 Functions of random variables

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Probability and Distributions

Financial Econometrics and Volatility Models Copulas

MAS223 Statistical Inference and Modelling Exercises

Introduction to Normal Distribution

01 Probability Theory and Statistics Review

component risk analysis

4 Statistics of Normally Distributed Data

Elliptically Contoured Distributions

Copulas. MOU Lili. December, 2014

Section 8.1. Vector Notation

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

You can compute the maximum likelihood estimate for the correlation

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Probabilities & Statistics Revision

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

1: PROBABILITY REVIEW

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

3. Probability and Statistics

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

The Instability of Correlations: Measurement and the Implications for Market Risk

STAT 4385 Topic 01: Introduction & Review

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Multivariate Non-Normally Distributed Random Variables

If we want to analyze experimental or simulated data we might encounter the following tasks:

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Statistics and Data Analysis

conditional cdf, conditional pdf, total probability theorem?

Lecture 5: LDA and Logistic Regression

Multivariate Regression

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Exercises Chapter 4 Statistical Hypothesis Testing

Independent Component (IC) Models: New Extensions of the Multinormal Model

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Bayesian Decision Theory

Correlation analysis. Contents

Multivariate Statistical Analysis

STA 4273H: Statistical Machine Learning

STA 2201/442 Assignment 2

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Statistics for scientists and engineers

Lecture 11. Probability Theory: an Overveiw

MULTIVARIATE HOMEWORK #5

An Introduction to Multivariate Statistical Analysis

Lecture 11. Multivariate Normal theory

Math 423/533: The Main Theoretical Topics

Lecture Notes 1: Vector spaces

Multivariate random variables

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

A Very Brief Summary of Statistical Inference, and Examples

Multivariate Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

ECE521 week 3: 23/26 January 2017

BASICS OF PROBABILITY

Lecture Note 1: Probability Theory and Statistics

Lecture 2: Repetition of probability theory and statistics

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

COM336: Neural Computing

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Quick Tour of Basic Probability Theory and Linear Algebra

1. Introduction to Multivariate Analysis

TAMS39 Lecture 2 Multivariate normal distribution

Transcription:

Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical Engineering Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 1 / 92

Chapter outline 1 Introduction. 2 Basic concepts. 3 Multivariate distributions. 4 Statistical inference. 5 Hypothesis testing. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 2 / 92

Introduction Multivariate statistical analysis is concerned with analysing and understanding data in high dimensions. Therefore, we assume that we are given a set of n observations of a multivariate random variable x in R p. Thus, each observation has p dimensions and it is an observed value of the multivariate random variable x that is composed of p random variables: x = (x 1,..., x p ) where x j, for j = 1,..., p is a univariate random variable. In this chapter we give an introduction to the basic probability tools useful in statistical multivariate analysis. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 3 / 92

Introduction In particular, we present: the basic probability tools used to describe a multivariate random variable, including marginal and conditional distributions and the concept of independence; the mean vector, the covariance matrix and the correlation matrix of a multivariate random variable and their counterparts for marginal and conditional distributions; the basic techniques needed to derive the distribution of transformations with special emphasis on linear transformations; several multivariate distributions, including the multivariate Gaussian distribution, along with most of its companion distributions and other interesting alternatives; and statistical inference for multivariate samples, including parameter estimation and hypothesis testing. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 4 / 92

Basic concepts We can say that we have the joint distribution of a multivariate random variable when the following are specified: 1 The sample space of the possible values, which, in general, is a subset of R p. 2 The probabilities of each possible result of the sample space. We say that a p-dimensional random variable is discrete when each of the p scalar variables that comprise it are discrete as well. Analogously, we say that the variable is continuous if its components are continuous as well. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 5 / 92

Basic concepts Let x = (x 1,..., x p ) be a multivariate random variable. The cumulative distribution function (cdf) of x at a point x 0 = ( x 0 1,..., x 0 p ), is denoted by F x ( x 0 ) and is given by: ( F x x 0 ) = Pr ( x x 0) = Pr ( x 1 x1 0,..., x p xp 0 ) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 6 / 92

Basic concepts For continuous multivariate random variables, a nonnegative probability density function (pdf) f x exists, such that: F x ( x 0 ) = x 0 p x 0 1 f x (x 1,..., x p ) dx 1 dx p Note that: f x (x 1,..., x p ) dx 1 dx p = 1 Note also that the cdf F x is differentiable with: f x (x) = p F x (x) x 1 x p Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 7 / 92

Basic concepts For discrete multivariate random variables, the values of the random variable are concentrated on a countable or finite set of points {c j } j J. The probability of events of the form x D, for a certain set D J can be computed as: Pr (x D) = Pr (x = c j ) j:c j D For simplicity we will focus on continuous multivariate random variables. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 8 / 92

Basic concepts The marginal density function of a subset of the elements of x, say ( ), x i1,..., x ij is given by: ( ) f xi1,...,x xi1 ij,..., x ij = i 1,...,i j f x (x 1,..., x p ) dx 1 dx p i 1,...,i j In particular, the marginal density function of each x j, for j = 1,..., p is given by: f xj (x j ) = j f x (x 1,..., x p ) dx 1 dx p j Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 9 / 92

Basic concepts Let x = (x 1,..., x p ) and y = (y 1,..., y q ) be two multivariate random variables with density functions f x and f y, respectively, and joint cumulative density function f x,y. Then, the conditional density function of y given x is given by: f y x (y x) = f x,y (x, y) f x (x) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 10 / 92

Basic concepts From the previous definition, we can deduce that the pdf of (x, y) is given by: As a consequence: f x,y (x, y) = f y x (y x) f x (x) = f x y (x y) f y (y) f y x (y x) = f x y (x y) f y (y) f x (x) = f x y (x y) f y (y) fx,y (x, y) dy = f x y (x y) f y (y) fx y (x y) f y (y) dy This is the Bayes Theorem, one of the most important results in Statistics as it is the base of Bayesian inference. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 11 / 92

Basic concepts The multivariate random variables x and y are independent if, and only if: f x,y (x, y) = f x (x) f y (y) Therefore, if x and y are independent, then: f y x (y x) = f y (y) and, f x y (x y) = f x (x) Independence can be interpreted as follows: knowing y = y 0 does not change the probability assessments on x, and conversely. In general, the p univariate random variables x 1,..., x p are independent if, and only if: f x1,...,x p (x 1,..., x p ) = f x1 (x 1 ) f xp (x p ) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 12 / 92

Basic concepts It is important to note that different multivariate pdf may have the same marginal pdf s. For instance, it is easy to see that the bivariate pdf s given by: and, f x1,x 2 (x 1, x 2 ) = 1, 0 < x 1, x 2 < 1 f x1,x 2 (x 1, x 2 ) = 1 + 0.5 (2x 1 1) (2x 2 1), 0 < x 1, x 2 < 1 have the marginals pdf given by: f x1 (x 1 ) = 1, 0 < x 1 < 1 and, respectively. f x2 (x 2 ) = 1, 0 < x 2 < 1 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 13 / 92

Basic concepts An elegant concept of connecting marginals with joint cdf s is given by copulae. For simplicity of presentation we concentrate on the p = 2 dimensional case. A 2-dimensional copula is a function C : [0, 1] 2 properties: 1 For every u [0, 1] : C (0, u) = C (u, 0) = 0. [0, 1] with the following 2 For every u [0, 1] : C (1, u) = C (u, 1) = u. 3 For every (u 1, u 2), (v 1, v 2) [0, 1] [0, 1] with u 1 v 1 and u 2 v 2: C (v 1, v 2) C (v 1, u 2) C (u 1, v 2) + C (u 1, u 2) 0 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 14 / 92

Basic concepts The usefulness of a copula function C is explained by the Sklar s Theorem. Sklar s Theorem Let F x be a multivariate cdf with marginal cdf s F x1 and F x2. Then, a copula C x1,x 2 exists with: F x1,x 2 (x 1, x 2 ) = C x1,x 2 (F x1 (x 1 ), F x2 (x 2 )) for every x 1, x 2 R 2. If F x1 and F x2 are continuous, then C x1,x 2 is unique. On the other hand, if C x1,x 2 is a copula and F x1 and F x2 are cdf s, then the function F x1,x 2 defined above, is a multivariate cdf with marginals F x1 and F x2. Therefore, a copula function links a multivariate distribution to its one-dimensional marginals. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 15 / 92

Basic concepts Theorem Let x 1 and x 2 be random variables with cdf s F x1 and F x2, and multivariate cdf F x1,x 2. Then, x 1 and x 2 are independent if and only if: C x1,x 2 (F x1, F x2 ) = F x1 F x2 The previous copula function is called the independence copula. Other copula functions will be given in this chapter. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 16 / 92

Basic concepts Let x = (x 1,..., x p ) be a multivariate random variable. The expectation or mean vector of x, is the vector µ x whose components are the expectations or means of the components of the random variable, i.e.: µ x = E [x] = E [x 1 ]. E [x p ] where, E [x j ] = x j f xj (x j ) dx j and f xj (x j ) is the marginal density function of x j. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 17 / 92

Basic concepts The covariance matrix of the multivariate random variable x with mean vector µ x, is a symmetric and semidefinite positive matrix given by: Σ x = E [ (x µ x ) (x µ x ) ] The diagonal elements of Σ x are the variances of the components given by, for j = 1,..., p. σ 2 x,j = (x j µ x,j ) 2 f xj (x j ) dx j, The elements outside the main diagonal are the covariances between pairs of variables, σ x,jk = for j, k = 1,..., p. (x j µ x,j ) (x k µ x,k ) f xj,x k (x j, x k ) dx j dx k, Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 18 / 92

Basic concepts The correlation matrix of the multivariate random variable x with covariance matrix Σ x is given by: ϱ x = 1/2 x Σ x 1/2 x where x is a diagonal matrix with the variances of the components of x. The elements outside the main diagonal are the correlations between pairs of variables, given by: ρ x,jk = σ x,jk σ x,j σ x,k Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 19 / 92

Basic concepts Let x = (x 1,..., x p ) be a multivariate random variable and let ( x i1,..., x ij ) be a subset of the elements of x. Then, the mean vector and the covariance and correlation matrices of ( x i1,..., x ij ) are obtained by extracting the corresponding elements of the mean vector and the covariance and correlation matrices of x. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 20 / 92

Basic concepts Let x = (x 1,..., x p ) and y = (y 1,..., y q ) be two random variables with density functions f x and f y, respectively, and let f y x be the conditional density function of y given x. The conditional expectation of y given x is given by: E y x [y x] = yf y x (y x) dy which depends on x. An important property of E y x [y x] is that E y [y] = E x [ Ey x [y x] ]. Then, to compute E y [y], we can first compute E y x [y x] and then, take the expectation with respect to the distribution of x. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 21 / 92

Basic concepts Similarly, the conditional covariance and correlation matrices are the covariance and correlation matrices of the multivariate random variable y x. In particular, the condicional covariance matrix contains the conditional variances, Var yj x [y j x] and the conditional covariances, Cov yj,y k x [y j, y k x]. An important property of Var yj x [y j x] is that: [ Var yj [y j ] = E x Varyj x [y j x] ] [ + Var yj x Eyj x [y j x] ]. This is usually called the law of total variance. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 22 / 92

Basic concepts Let x = (x 1,..., x p ) and y = (y 1,..., y q ) be two multivariate random variables with mean vectors µ x and µ y and covariance matrices Σ x and Σ y, respectively. The covariance matrix between x = (x 1,..., x p ) and y = (y 1,..., y q ) is a p q matrix given by: Cov [x, y] = E [ (x µ x ) (y µ y ) ] Similarly, the correlation matrix between x = (x 1,..., x p ) and y = (y 1,..., y q ) is a p q matrix given by: Cor [x, y] = 1/2 x Cov [x, y] 1/2 y where x and y are diagonal matrices with elements the diagonal elements of Σ x and Σ y, respectively. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 23 / 92

Basic concepts Let x = (x 1,..., x p ) be a multivariate variable with pdf f x and let y = (y 1,..., y p ) a new variable given by: y = g (x) where g is a function with differentiable inverse given by: x = g 1 (y) = h (y) Therefore, the pdf of y is given by: ( ) ( ) f y (y) = f x (x) x det = f x (h (y)) h (y) y det y where x y is the Jacobian of the transformation, det ( ) stands for determinant and denotes the absolute value function. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 24 / 92

Basic concepts Consider the particular case of a linear transformation, y = Ax + b, where A is a non-singular p p matrix and b is a p 1 vector. Then, we have that x = A 1 (y b) while x y = A 1. Therefore: f y (y) = f x ( A 1 (y b) ) A 1 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 25 / 92

Basic concepts The previous case only consider transformation from a p-dimensional random variable to another p-dimensional random variable. The case of transformations from a p-dimensional random variable to an q- dimensional random variable, with p q is more difficult to handle. Therefore, we focus on the mean vector and the covariance matrix of the transformed random variable. Let x = (x 1,..., x p ) be a multivariate random variable and let y = (y 1,..., y q ) such that: y = Ax + b where A is a q p matrix and b is a q 1 column vector. Then, letting µ x and µ y be the mean vectors and Σ x and Σ y be the covariance matrices of x and y, respectively, we have: µ y = Aµ x + b, Σ y = AΣ x A Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 26 / 92

Multivariate distributions The multivariate Gaussian distribution is a generalization to two or more dimensions of the univariate Gaussian (or Normal) distribution. This is often characterized by its resemblance to the shape of a bell and this is why it is popularly referred as the bell curve. The Gaussian distribution is used extensively in both theoretical and applied statistics research. Although it is well known that real data rarely obey the dictates of the Gaussian distribution, this deception does provide us with a useful approximation to reality. The pdf of a univariate Gaussian random variable with mean µ x = E (x) and variance σx 2 = Var (x) is: f x (x) = ( ) 2πσx 2 ) 1/2 exp ( (x µ x) 2 < x < and we denote it as x N ( µ x, σ 2 x). 2σ 2 x Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 27 / 92

Multivariate distributions PDF of N(0,1) in blue, N(1,1) in green and N(0,2) in orange 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 x Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 28 / 92

Multivariate distributions Generalizing the univariate Gaussian distribution, the pdf of a multivariate Gaussian random variable x = (x 1,..., x p ) with mean vector µ x = E (x) and covariance matrix Σ x = Cov (x) is given by: f x (x) = (2π) p/2 Σ x 1/2 exp ( (x µ x) Σ 1 ) x (x µ x ) 2 where < x j <, for j = 1,..., p. We denote it as x N p (µ x, Σ x ). The next slides show some examples of pdfs of bivariate Gaussian distributions. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 29 / 92

Multivariate distributions PDF of multivariate standard Gaussian 0.15 0.10 0.05 4 2 4 2 0 x2 x1 0 2 4 4 2 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 30 / 92

Multivariate distributions PDF of Gaussian with correlation.9 0.3 0.2 0.1 4 0.0 4 2 0 x2 2 x1 0 2 4 4 2 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 31 / 92

Multivariate distributions PDF of Gaussian with correlation.9 0.3 0.2 0.1 4 0.0 4 2 0 x2 2 x1 0 2 4 4 2 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 32 / 92

Multivariate distributions How is the N p (µ x, Σ x ) distribution related to the N p (0 p, I p ) distribution (the standard multivariate Gaussian distribution)? Through a linear transformation as follows. Let x N p (µ x, Σ x ) and y = Σ 1/2 x (x µ x ). Then, x N p (0 p, I p ). How can we create N p (µ x, Σ x ) variables on the basis of N p (0 p, I p ) variables? We use the inverse linear transformation: x = Σ 1/2 x y + µ x Additionally, it is of interest to know the distribution of a Gaussian variable after it has been linearly transformed. Let x N p (µ x, Σ x ), A a q p matrix and b a q 1 column vector. Then, y = Ax + b has a N q (Aµ x + b, AΣ x A ) distribution. Therefore, y has also a Gaussian distribution. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 33 / 92

Multivariate distributions The level curves or contours are the curves obtained by cutting the probability density function by parallel hyperplanes. In other words, the level curves are points with the same density value. In the multivariate Gaussian case, their equation is given by: (x µ x ) Σ 1 x (x µ x ) = c where c is a constant. Therefore, the level curves of multivariate Gaussian distributions are ellipsoids. The next two slides show the level curves for the Gaussian distributions considered in the previous plots with and without a sample of 100 points generated from these distributions. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 34 / 92

Multivariate distributions Levels curves for Gaussian with correlation 0 Levels curves for Gaussian with correlation.9 Levels curves for Gaussian with correlation.9 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 35 / 92

Multivariate distributions Levels curves for Gaussian with correlation 0 Levels curves for Gaussian with correlation.9 Levels curves for Gaussian with correlation.9 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 36 / 92

Multivariate distributions The level curves of the multivariate Gaussian distribution give us a notion of distance between points. Note that all the points in the level curve have the same density and form an ellipsoid. Therefore, it is reasonable to assume that all the points in a level curve are at the same distance from the center of the distribution. The implied distance is the Mahalanobis distance between x and µ x, given by: D M (x, µ x ) 2 = (x µ x ) Σ 1 x (x µ x ) If x N p (µ x, Σ x ), the squared Mahalanobis distance has a χ 2 p distribution, i.e., D M (x, µ) 2 χ 2 p. The Mahalanobis distance plays an important role in many problems such as outlier detection, classification, clustering and so on. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 37 / 92

Multivariate distributions Random sample 54 2 1 0 1 2 75 79 66 100 36 35 27 91 33 29 12 67 63 88 42 5362 10 61 76 44 48 95 21 51 24 9856 8982 86 15 93 81 73 65 68 64 90 43 83 39 87 18 99963 80 16 8 5 71 1784 23 69 19 41 7472 59 13 34 92 6 49 5838 30 1 55 20 97 50 7 85 31 40 4 32 26 9 222578 45 70 94 2 14 52 47 60 37 77 46 11 28 2 1 0 1 2 Mahalanobis distances 5 54 0 2 4 6 77 36 8 9522527 100757988492932 2287 166137117335 29326144628 72445663370786094477657 4639742868368915012 63062 155692092533931431067854440 15826496 39965189080897141812184177259197413954851345698235838 0 20 40 60 80 100 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 38 / 92

Multivariate distributions It is useful to know more about the multivariate Gaussian distribution, since it is often a good approximation in many situations. It is often of interest to partition x into sub-variables. Therefore, if we partition x, its mean vector µ x and its covariance matrix Σ x as: ( ) ( ) x(1) µx(1) x = µ x x = (2) µ x(2) and, ( ) Σx(11) Σ Σ x = x(12) Σ x(21) Σ x(22) where x (1) and x (2) have dimensions q and p q, respectively, then, x (1) N q ( µx(1), Σ x(12) ), x(2) N p q ( µx(2), Σ x(22) ) and Cov ( x(1), x (2) ) = Σx(12). Moreover, x (1) and x (2) are independent if and only if Σ x(12) = 0 (q,p q), where 0 (q,p q) is a q (p q) matrix of zeros. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 39 / 92

Multivariate distributions If Σx(22) > 0, then the conditional distribution of x(1) given x (2) is Gaussian with mean: µ x(1) + Σ x(12) Σ 1 ( ) x(22) x(2) µ x(2) and covariance matrix: Σ x(11) Σ x(12) Σ 1 x(22) Σ x(21) If x (1) and x (2) are independent and distributed as N q ( µx(1), Σ x(12) ) and x(2) ( ) N p q µx(2), Σ x(22), respectively, then x = (x(1) (2)), x has the multivariate Gaussian distribution: (( ) ( )) µx(1) Σx(11) 0 N p, (q,p q) µ x(2) 0 (p q,q) Σ x(22) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 40 / 92

Multivariate distributions The multivariate Gaussian distribution belongs to the large family of elliptical distributions which has recently gained a lot of attention in financial mathematics. The simplest case of elliptical distributions is the subclass of spherical distributions. We say that a vector variable x = (x 1,..., x p ) follows a spherical distribution if its density function only depends on the variable through x x. Therefore, the level curves of the distribution are spheres whose center is in the origin and the distribution is invariant under rotations. In other words, if we define y = Cx, where C is an orthogonal matrix, the density of the variable y is the same as that of x. This is only one of the possible ways to define spherical distributions. We can see spherical distributions as an extension of the standard multivariate Gaussian distribution N p (0 p, I p ). Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 41 / 92

Multivariate distributions The variable x = (x 1,..., x p ) follows a elliptical distribution if its density function only depends on x through (x m) V 1 (x m), where m is a p 1 column vector and V is a p p matrix (not necessarily the mean and the covariance matrix of x). The elliptical distributions verifies that their level curves are ellipsoids centered in m. The multivariate Gaussian distribution is the best known elliptical distribution. Indeed, elliptical distributions can be seen as an extension of the N p (µ x, Σ x ). Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 42 / 92

Multivariate distributions Let y N p (0 p, Σ) and u χ 2 ν be independent. The multivariate random variable: ν x = µ + u y has a multivariate Student s t distribution with parameters µ, Σ and ν. For ν > 2, the mean of the distribution is µ and the covariance matrix is v/ (v 2) Σ. The parameter ν is called the degrees of freedom parameter. The density function of a multivariate Student s t distribution is given by: f x (x) = Γ ( ) ν+p 2 (πν) p 2 Γ ( ) V x 1/2 ( 1 + (x m x ) V ν x 1 (x m x ) ) ν+p 2 2 The multivariate Student s t distribution belongs to the class of elliptical distributions. In particular, if Σ = I p, this distribution belongs to the class of spherical distributions. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 43 / 92

Multivariate distributions PDF of a Student't distribution with 5 df 0.3 0.2 0.1 2 4 4 2 0 x2 x1 0 2 2 4 4 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 44 / 92

Multivariate distributions Elliptical distributions share many properties with Gaussian distributions: marginal and conditional distributions are also elliptical, and the conditional means are a linear function of the determining variables. Nevertheless, the Gaussian distribution is the only one in the family to have the property whereby if the covariance matrix is diagonal, all the component variables are independent. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 45 / 92

Multivariate distributions A distribution is called heavy-tailed if it has higher probability density in its tail area compared with a Gaussian distribution with the same mean vector and covariance matrix. The multivariate Student s t distribution is an example of heavy-tailed distributions. Other examples of heavy-tailed distributions includes the multivariate generalized hyperbolic distribution, the multivariate Laplace distribution and the multivariate mixture of distributions. In particular, we briefly revise multivariate mixtures of distributions. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 46 / 92

Multivariate distributions Mixture modelling concerns modelling a statistical distribution by a mixture (or weighted sum) of different distributions. For many choices of component density functions, the mixture model can approximate any continuous density to arbitrary accuracy, provided that the number of component density functions is sufficiently large and the parameters of the model are chosen correctly. The density function of a multivariate random variable x = (x 1,..., x p ) that follows a mixture distribution is given by: f x (x) = G π g f x,g (x) g=1 where: π1,..., π G are weights such that G g=1 πg = 1; fx,1(x),..., f x,g (x) are multivariate pdf s. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 47 / 92

Multivariate distributions Note that the mixture distributions have an interesting interpretation in terms of heterogeneous populations. Assume a population where we have defined the multivariate random variable x and that can be subdivided more homogeneously into G groups. Then, π 1,..., π G can be seen as the proportion of elements in the groups 1,..., G, while f x,1 (x),..., f x,g (x) are multivariate pdf s associated with each population. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 48 / 92

Multivariate distributions PDF of a Mixture distribution 0.08 0.06 0.04 6 0.02 4 0.00 4 2 2 0 0 x2 x1 2 2 4 6 4 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 49 / 92

Multivariate distributions Levels curves for a mixture of Gaussian distributions 4 2 0 2 4 6 4 2 0 2 4 6 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 50 / 92

Multivariate distributions PDF of a Mixture distribution 0.15 0.10 0.05 4 6 0.00 4 2 2 0 0 x2 x1 2 2 4 6 4 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 51 / 92

Multivariate distributions Levels curves for a mixture of Gaussian distributions 4 2 0 2 4 6 4 2 0 2 4 6 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 52 / 92

Multivariate distributions One main problem in multivariate analysis is how to model dependence of the components of a multivariate random variable. We have seen several multivariate distributions that model this dependence. However, these models, except perhaps mixtures, are not flexible enough to model multivariate dependence. As seen before, Copulae represent an elegant concept of connecting marginals with joint cumulative distribution functions. Copulas are functions that join or couple multivariate distribution functions to their 1-dimensional marginal distribution functions. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 53 / 92

Multivariate distributions Let x = (x 1,..., x p ) be a multivariate random variable and let F xj, for j = 1,..., p, be the marginal distribution functions of the components of x. Using copulae, the marginal distribution functions can be separately modelled from their dependence structure and then coupled together to form the multivariate distribution F x. The formal definition of copula function is more complex than in the 2-dimensional case. However, the intuition is the same as for the 2-dimensional case, so we do not provide here its formal definition. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 54 / 92

Multivariate distributions Sklar s Theorem in p dimensions Let F x be a p-dimensional distribution function with marginal distribution functions F x1,..., F xp. Then, a p-dimensional copula C x exists such that for all x 1,..., x p R p : F x (x 1,..., x p ) = C x ( Fx1 (x 1 ),..., F xp (x p ) ) Moreover, if F x1,..., F xp are continuous then C x is unique. Conversely, if C x is a copula and F x1,..., F xp are distribution functions then F x defined above is a p-dimensional distribution function with marginals F x1,..., F xp. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 55 / 92

Multivariate distributions Let F z denote the univariate standard Gaussian distribution function and F x the p-dimensional Gaussian distribution with mean vector 0 p and covariance as well as correlation matrix Σ x. Then, the function: C Gauss x,σ x ( (u) = F x F 1 z (u 1 ),..., Fz 1 (u p ) ) is the p-dimensional Gaussian copula with correlation matrix Σ x, where u = (u 1,..., u p ) [0, 1] p. If Σ x I p, then, the corresponding Gaussian copula allows to generate joint symmetric dependence. However, it is not possible to model a tail dependence, e.g., joint extreme events have a zero probability. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 56 / 92

Multivariate distributions The function: 1/θ p x,θ (u) = exp ( log u j ) θ C GH j=1 is the p-dimensional Gumbel-Hougaard copula function where θ [1, ). Unlike the Gaussian copula, C GH x,θ can generate an upper tail dependence. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 57 / 92

Multivariate distributions PDF of a Copula distribution 1.5 1.0 0.5 0.0 0.0 0.5 0 x1 4 2 x2 1.0 1.5 2 2.0 4 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 58 / 92

Statistical inference In multivariate statistics, we observe the values of a multivariate random variable x = (x 1,..., x p ) and obtain a sample x i = (x i1,..., x ip ), for i = 1,..., n summarised in a data matrix X. For a given random sample, x 1,..., x n, the idea of statistical inference is to analyse the properties of the population random variable x. If we do not know the distribution of x, statistical inference can often be performed using some observable functions of the sample x 1,..., x n, i.e., statistics. Example of statistics are the sample mean and the sample covariance matrix. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 59 / 92

Statistical inference To get an idea of the relationship between a statistic and the corresponding population counterpart, one has to derive the sampling distribution of the statistic. Therefore, given a random sample, x 1,..., x n, of the population random variable x such that E [x] = µ x and Cov [x] = Σ x, then, the sample mean vector x and the sample covariance matrix S x verifies the following properties: 1 E [x] = µ x. 2 Cov [x] = 1 n Σx. 3 E [S x] = Σ x. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 60 / 92

Statistical inference Statistical inference often requires more than just the mean and/or the covariance of a statistic. We need the sampling distribution of the statistic to derive confidence intervals or to define rejection regions in hypothesis testing for a given significance level. For instance, in the Gaussian case, we have the following result. Theorem Let x 1,..., x n be i.i.d. with x i N (µ x, Σ x ). Then, x N ( µ x, 1 n Σ x). The central limit theorem shows than even if the parent distribution is not Gaussian, when the sample size n is large, the sample mean vector x has an approximate Gaussian distribution. Central Limit Theorem (CLT) Let x 1,..., x n be i.i.d. with x i (µ x, Σ x ). Then, the distribution of n (x µ x ) is asymptotically N (0 p, Σ x ), i.e., n (x µx ) d N (0 p, Σ x ) as n Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 61 / 92

Statistical inference The next two slides show multivariate kernel density estimates of a sample of 2000 sample mean vector corresponding to 2000 samples of a certain bivariate random variable. The first slide corresponds to the case of n = 5. The second slide corresponds to the case of n = 100. It is easy to see that the second estimate appears to be closer to the bivariate Gaussian distribution than the first one. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 62 / 92

Statistical inference n=5 0.15 0.10 0.05 2 0.00 1 2 0 x1 1 x2 0 1 2 1 2 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 63 / 92

Statistical inference n=100 0.10 0.05 4 0.00 4 2 0 2 x1 x2 0 2 2 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 64 / 92

Statistical inference If we assume that we know the distribution of the multivariate random variable x, then, the mail goal of statistical inference is to estimate the parameters of this distribution. Then, let θ = (θ 1,..., θ r ) be the vector of parameters of a certain distribution with density function f ( θ). The aim is to estimate the vector θ from a i.i.d. sample x 1,..., x n from x. For that, the most important method to carry out this task is the maximum likelihood estimation (MLE) method. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 65 / 92

Statistical inference Let x 1,..., x n be an i.i.d. sample of x. Then, the joint pdf of x 1,..., x n is given by: n f (x 1,..., x n θ) = f (x i θ) Then, note that the sample is known (X, the data matrix) but θ is unknown. In MLE, it is considered that θ is a variable and X is fixed, leading to the likelihood function: n l (θ X ) = f (x i θ) where x i = (x i1,..., x ip ). i=1 i=1 The likelihood function can be seen as the pdf of θ X. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 66 / 92

Maximum likelihood estimation The maximum likelihood estimate (MLE) of θ, denoted by θ, is the value of θ that maximizes l (θ X ), i.e.: θ = arg maxl (θ X ) θ In other words, the MLE, θ, is the value of θ that maximizes the probability of obtaining the sample under study. Often it is easier to maximize the log of the likelihood function, named the log-likelihood function or support function: L (θ X ) = log l (θ X ) which is equivalent since the logarithm is a monotone one-to-one function. Hence, θ = arg maxl (θ X ) = arg maxl (θ X ) θ θ Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 67 / 92

Maximum likelihood estimation Usually, the maximisation process can not be performed analytically. Therefore, the maximisation process involves nonlinear optimization techniques. In this case, given a data matrix X and the likelihood function, numerical methods will be used to determine the value of θ maximising L (θ X ) or l (θ X ). These numerical methods are typically based on Newton-Raphson techniques. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 68 / 92

Maximum likelihood estimation Let x 1,..., x n be a simple random sample from x N (µ x, Σ x ). Then, the joint density function is: f (x 1,..., x n µ x, Σ x ) = n { (2π) p/2 Σ x 1/2 exp ( (x i µ x ) Σ 1 )} x (x i µ x ) 2 i=1 Then, the support function is given by: L (µ x, Σ x X ) = np 2 log 2π n 2 log Σ x 1 2 n i=1 (x i µ x ) Σ 1 x (x i µ x ) Next, note that we can write: n i=1 (x i µ x ) Σ 1 x (x i µ x ) = Tr [ Σ 1 x ( n )] (x i µ x ) (x i µ x ) i=1 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 69 / 92

Maximum likelihood estimation On the other hand, adding and subtracting the sample mean vector x in (x i µ x ) leads to: n (x i µ x ) (x i µ x ) = i=1 = n (x i x + x µ x ) (x i x + x µ x ) = i=1 n (x i x) (x i x) + n (x µ x ) (x µ x ) i=1 because the terms n i=1 (x i x) (x µ x ) and n i=1 (x µ x) (x i x) are both matrices of zeros. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 70 / 92

Maximum likelihood estimation Consequently: Tr = Tr [ [ Σ 1 x Σ 1 x n i=1 (x i µ x ) Σ 1 x (x i µ x ) = ( n )] (x i x) (x i x) + n (x µ x ) (x µ x ) = i=1 ( n )] (x i x) (x i x) i=1 + n (x µ x ) Σ 1 x (x µ x ) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 71 / 92

Maximum likelihood estimation Therefore, the support function can be written as: 1 2 ( Tr [ Σ 1 x L (µ x, Σ x X ) = np 2 log 2π n 2 log Σ x ( n i=1 (x i x) (x i x) )] + n (x µ x ) Σ 1 x (x µ x ) Now, L (µ x, Σ x X ) only depends on µ x in the last term and that this is maximized if (x µ x ) Σ 1 x (x µ x ) = 0. Therefore, the MLE of µ x is µ = x. ) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 72 / 92

Maximum likelihood estimation It remains to maximize: L (Σ x X, µ x = x) = np 2 log 2π n 2 log Σ x 1 2 Tr [ Σ 1 x ( n )] (x i x) (x i x) For that, we need a result from the matrix algebra: Given a p p symmetric positive definite matrix B and a scalar b > 0, it follows that: b log Σ x 1 2 Tr ( Σ 1 x B ) b log B + pb log (2b) pb Then, taking b = n/2 and B = n i=1 (x i x) (x i x), shows that the MLE of Σ x is: Σ x = 1 n (x i x) (x i x) n i=1 Note that the MLE of Σ x is not the sample covariance matrix but a re-scaled version of it. i=1 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 73 / 92

Maximum likelihood estimation The next Theorem gives the asymptotic sampling distribution of the MLE, which turns out to be Gaussian. Theorem Suppose that the sample x 1,..., x n is i.i.d. If θ is the MLE for θ R r, i.e., θ = arg maxl (θ X ), then under some regularity conditions, as n : θ n ( θ θ ) d N ( 0 r, F 1) where F denotes the Fisher information matrix given by: F = 1 [ ] 2 n E θ θ L (θ X ) As a consequence of this Theorem, we see that under regularity conditions the MLE is asymptotically unbiased, efficient (minimum variance) and Gaussian distributed. Also it is a consistent estimator of θ. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 74 / 92

Hypothesis testing We turn now our interest towards hypothesis testing issues. In particular, we will go over a general methodology to construct tests called the likelihood ratio method and we will apply them to the case of Gaussian populations. Then, we assume a r-dimensional vector parameter, θ, that takes values in Ω R r. We want to test the hypothesis H 0 that the unknown parameter θ belongs to some subspace of R r. This subspace is called the null set and will be denoted by Ω 0 R r. Consequently, we want to test the hypothesis: versus the alternative hypothesis: H 0 : θ Ω 0 H 1 : θ Ω which suppose that θ is not restricted to Ω 0. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 75 / 92

Hypothesis testing For example, consider a multivariate Gaussian N (µ x, Σ x ). To test if µ x equals a certain fixed value of µ 0, we construct the test problem: H 0 : µ x = µ 0 H 1 : no constraints on µ x Then, in this example we have Ω 0 = {µ 0 } and Ω = R p. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 76 / 92

Hypothesis testing Define l 0 = max l (θ X ) and l = maxl (θ X ), the values of the maximized θ Ω 0 θ Ω likelihood under H 0 and H 1, respectively. Consider the likelihood ratio (LR) given by: LR = l 0 l By construction 0 LR 1, and one tends to favour H 0 if the LR is high ( close to 1) and H 1 if the LR is low ( not close to 1). The likelihood ratio test (LRT) tell us when exactly to favour H 0 over H 1. This is given by: λ = 2 ln LR = 2 (ln l 0 ln l ) The LRT λ is asymptotically distributed like a χ 2 distribution with the number of degrees of freedom equal to the difference of the dimension between the spaces Ω and Ω 0. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 77 / 92

Hypothesis testing Given a sample from a population N (µ x, Σ x ), we want to test the hypothesis: against the alternative H 0 : µ x = µ 0 H 1 : µ x µ 0 It is possible to show that the likelihood ratio test statistic is given by, Σ 0 λ = n log Σ x where, Σ 0 = 1 n n (x i µ 0 ) (x i µ 0 ) i=1 which has an asymptotic χ 2 distribution with p degrees of freedom. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 78 / 92

Illustrative example (I) Consider the daily log-returns (in percentages) of four major European stock indices: Germany (DAX), Switzerland (SMI), France (CAC) and UK (FTSE), from 1991 to 1998. We want to test the null hypothesis that the mean vector of returns is zero (assuming Gaussianity). The estimated mean is given by: x = (0.065, 0.081, 0.043, 0.043) The covariance matrix under H 0 is given by: Σ 0 = 1.064 0.674 0.836 0.526 0.674 0.861 0.631 0.433 0.836 0.631 1.217 0.570 0.526 0.433 0.570 0.634 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 79 / 92

Illustrative example (I) The covariance matrix under H 1 is given by: Σ x = 1.060 0.669 0.834 0.523 0.669 0.855 0.628 0.430 0.834 0.628 1.216 0.569 0.523 0.430 0.569 0.632 The value of the statistic is λ = 11.70 with associated p-value 0.0196. Thus, we reject H 0 at the 5% significant level but we cannot reject H 0 at the 1% significant level. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 80 / 92

Hypothesis testing Given a sample of a population N (µ x, Σ x ), we want to test the hypothesis: against the alternative H 0 : Σ x = Σ 0 H 1 : Σ x Σ 0 It is possible to show that the likelihood ratio test statistic is given by, λ = n log Σ ( 0 + ntr Σ Σ 1 0 ) Σ np which has an asymptotic χ 2 distribution with p (p + 1) /2 degrees of freedom. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 81 / 92

Hypothesis testing It is also of interest to know whether Σ x is diagonal, in which case the univariate variables are independent. In this case, we gain nothing from analyzing them jointly since they have no information in common. Then: against the alternative H 0 : Σ x diagonal H 1 : Σ x unrestricted It is possible to show that the likelihood ratio test statistic is given by, λ = n log R x which has an asymptotic χ 2 distribution with p (p 1) /2 degrees of freedom. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 82 / 92

Illustrative example (I) Consider again the daily log-returns (in percentages) of four major European stock indices. We test the null hypothesis of independency (assuming Gaussianity). The estimated correlation matrix under H 0 is given by: 1 0.703 0.734 0.639 R = 0.703 1 0.616 0.584 0.734 0.616 1 0.648 0.639 0.584 0.648 1 The value of the statistic is λ = 4071.87 with associated p-value 0. Thus, we reject H 0 at the usual significance levels. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 83 / 92

Hypothesis testing Assume that we have observed a sample of size n of a p-dimensional variable x = (x 1,..., x p ) that can be split into G groups so that there are n 1 observations of group 1, and so on. Our goal here is to check whether the means of the G groups are equal or not assuming Gaussianity and that the covariance matrix Σ x is the same for all the groups. Then, the hypothesis to be tested is: and the alternative hypothesis is: H 0 : µ 1 = = µ G = µ x H 1 : not all the µ g are equal This problem is known as the multivariate analysis of variance. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 84 / 92

Hypothesis testing The likelihood ratio test method leads to the statistic: Σ x λ = n log S W where Σ x is the MLE of Σ x under Gaussianity, and S W = W /n where: W = n G g (x ig x g ) (x ig x g ) g=1 i=1 where x ig is the i-th observation in group g and x g is the sample mean vector of the observations in group g. W is usually called the within groups variability matrix or matrix of deviations with respect to the means of each group. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 85 / 92

Hypothesis testing The statistic λ has an asymptotic χ 2 p(g 1) distribution. However, this approximation can be improved for small sample sizes. For instance, the statistic: Σ x λ 0 = m log S W asymptotically follows a χ 2 p(g 1) distribution, where m = (n 1) (p + G) /2. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 86 / 92

Hypothesis testing This test can be derived in an alternative way. Let: T = n Σ x = n G g (x ig x) (x ig x) g=1 i=1 be the total variability of the data, which measures the deviations with respect to a common mean. The matrix T can be decomposed as the sum of two matrices. The first one is the matrix W which has been defined previously. The second one measures the between groups variability, explained by the differences between means, and that we will denote as B: Therefore, we can write: B = G n g (x g x) (x g x) g=1 T (Total variability) = B (Explained variability) + W (Residual variability) Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 87 / 92

Hypothesis testing In order to test whether the means are equal we can compare the size of the matrices T and B. One idea is to consider that the measurement of their size is their determinant. Then, we can propose a test based on the ratio T / W. For moderate sizes, the test is similar to the likelihood ratio test that uses the statistic λ 0, that can also be written as: Σ x T λ 0 = m log = m log S W W Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 88 / 92

Illustrative example (II) We consider the Iris dataset consisting in four univariate variables measured on 150 flowers of 3 different species (setosa, versicolor and virginica). There are 50 flowers of each specie: x1: Length of the sepal (in mm.). x2: Width of the sepal (in mm.). x3: Length of the petal (in mm.). x4: Width of the petal (in mm.). The next slide shows the scatterplot matrix of the dataset. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 89 / 92

Illustrative example (II) Iris dataset 2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 Sepal.Length 4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0 Sepal.Width Petal.Length 1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 Petal.Width 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 1 2 3 4 5 6 7 Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 90 / 92

Illustrative example (II) We test the equality of means for the 3 groups of the Iris dataset. The means of the 3 groups are given by: 5.006 x 1 = 3.428 1.462 x 2 = 0.246 5.936 2.770 4.260 1.326 x 2 = 6.588 2.974 5.552 2.026 The value of the statistic λ is 563.00 with associated p-value 0. Thus, we reject H 0. On the other hand, the value of the statistic λ 0 is 544.23 with associated p-value 0. Thus, we also reject H 0 with this statistic. Consequently, we reject that the three subset of observations have the same means. Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 91 / 92

Chapter outline 1 Introduction. 2 Basic concepts. 3 Multivariate distributions. 4 Statistical inference. 5 Hypothesis testing. We are ready now for: Chapter 3: Principal components Pedro Galeano (Course 2016/2017) Multivariate Statistics - Chapter 3 Master in Mathematical Engineering 92 / 92