Multivariate Statistical Analysis

Similar documents
Multivariate Statistical Analysis

Multivariate Statistical Analysis

Inferences about a Mean Vector

Lecture 3. Inference about multivariate normal distribution

The Bayes classifier

Lecture 6 Multiple Linear Regression, cont.

Lecture 5: Hypothesis tests for more than one sample

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Mean Vector Inferences

Rejection regions for the bivariate case

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Applied Multivariate and Longitudinal Data Analysis

The linear model is the most fundamental of all serious statistical models encompassing:

F & B Approaches to a simple model

Simple Linear Regression

Bayesian Linear Regression

Multiple Linear Regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

MATH5745 Multivariate Methods Lecture 07

STT 843 Key to Homework 1 Spring 2018

Multivariate Statistical Analysis

Stat 5101 Lecture Notes

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

6 Multivariate Regression

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Notes on the Multivariate Normal and Related Topics

Principal component analysis

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Machine Learning (CS 567) Lecture 5

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Introduction to Simple Linear Regression

Sampling Distributions

STA 437: Applied Multivariate Statistics

Statistical Inference On the High-dimensional Gaussian Covarianc

1 Statistical inference for a population mean

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Linear Methods for Prediction

ST505/S697R: Fall Homework 2 Solution.

This paper is not to be removed from the Examination Halls

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

STA 2201/442 Assignment 2

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Least Squares Estimation

Topic 22 Analysis of Variance

Association studies and regression

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

Lecture 9: Classification, LDA

Today we will prove one result from probability that will be useful in several statistical tests. ... B1 B2 Br. Figure 23.1:

Statistical Models. ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley. Dave Campbell 2009

Lecture 9: Classification, LDA

Simple Linear Regression

Bayesian Linear Models

TAMS39 Lecture 2 Multivariate normal distribution

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

MULTIVARIATE POPULATIONS

01 Probability Theory and Statistics Review

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Multivariate Distributions

Ch 2: Simple Linear Regression

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Bayesian Decision and Bayesian Learning

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Linear Regression Models

Multivariate Statistics

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Bayesian Linear Models

Simple Linear Regression

Stat 206: Estimation and testing for a mean vector,

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Lectures 5 & 6: Hypothesis Testing

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

You can compute the maximum likelihood estimate for the correlation

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Multivariate Gaussian Analysis

Exam 2. Jeremy Morris. March 23, 2006

Part 6: Multivariate Normal and Linear Models

Linear models and their mathematical foundations: Simple linear regression

Lecture 15. Hypothesis testing in the linear model

Profile Analysis Multivariate Regression

Ch 3: Multiple Linear Regression

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Formal Statement of Simple Linear Regression Model

Chapter 7, continued: MANOVA

Methods for Identifying Out-of-Trend Data in Analysis of Stability Measurements Part II: By-Time-Point and Multivariate Control Chart

5 Inferences about a Mean Vector

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Analysis of Variance (ANOVA)

CHAPTER 5. Outlier Detection in Multivariate Data

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

The Components of a Statistical Hypothesis Testing Problem

6-1. Canonical Correlation Analysis

Linear Methods for Prediction

Lecture 8: Classification

Analysis of variance, multivariate (MANOVA)

Transcription:

Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis

Outline Two sample T 2 test 1 Two sample T 2 test 2

Analogous to the univariate context, we wish to determine whether the mean vectors are comparable, more formally: H 0 : µ 1 = µ 2 (1) Suppose we let y 1i, i = 1,... n 1 and y 2i, i = 1,... n 2 represent independent samples from two p-variate normal distribution with mean vectors µ 1 and µ 2 but with common covariance matrix Σ unknown, provided Σ is positive definite and n > p, given sample estimators for mean and covariance ȳ and S respectively.

We can then define W 1 = (n 1 1)S 1 = W 2 = (n 2 1)S 2 = n 1 i=1 n 2 (y 1i ȳ 1 )(y 1i ȳ 1 ) (y 2i ȳ 2 )(y 2i ȳ 2 ) i=1 since each are unbiased estimators of the common covariance matrix, ie. E[(n 1 1)S 1 ] = (n 1 1)Σ and E[(n 2 1)S 2 ] = (n 2 1)Σ

The T 2 statistic can be calculated as: ( )( ) T 2 n1 n 2 n1 n 2 = (ȳ 1 ȳ 2 ) S 1 (ȳ 1 ȳ 2 ) (2) n 1 + n 2 n 1 + n 2 where S 1 is the inverse of the pooled correlation matrix given by: S = (n 1 1)S 1 + (n 2 1)S 2 n 1 + n 2 2 1 = n 1 + n 2 2 (W 1 + W 2 ) given the sample estimates for covariance, S 1 and S 2 in the two samples.

Outline Two sample T 2 test 1 Two sample T 2 test 2

Again, there is a simple relationship between the test statistic, T 2, and the F distribution: Theorem If y 1i, i = 1,... n 1 and y 2i, i = 1,... n 2 represent independent samples from two p variate normal distribution with mean vectors µ 1 and µ 2 but with common covariance matrix Σ, provided Σ is positive definite and n > p, given sample estimators for mean and covariance ȳ and S respectively, then: F = (n 1 + n 2 p 1)T 2 (n 1 + n 2 2)p has an F distribution on p and (n 1 + n 2 p 1) degrees of freedom.

Essentially, we compute the test statistic, and see whether it falls within the (1 α) quantile of the F distribution on those degrees of freedom. Note again that to ensure non-singularity of S, we require that n 1 + n 2 > p.

Characteristic form Two sample T 2 test [( ) ] T 2 = (ȳ 1 ȳ 2 ) 1 1 1 S pl (ȳ 1 ȳ 2 ) (3) n 1 n 2

Outline Two sample T 2 test 1 Two sample T 2 test 2

Two sample T 2 test What was all that stuff about likelihood ratio s about? It turns out that it is possible to show that: ( ) Λ 2/n ˆΣ = = (1 + T 2 ) 1 (4) ˆΣ 0 n 1 It is also possible to obtain the T 2 via union intersection methods. This is nice because it tells us a lot about the properties of the test!

Essentially, we wish to find a region of squared Mahalanobis distance such that: and we can find c 2 as follows: ( n 1 c 2 = n Pr ( (ȳ µ) S 1 (ȳ µ) ) c 2 )( p n p ) F (1 α),p,(n p) where F (1 α),p,(n p) is the (1 α) quantile of the F distribution with p and n p degrees of freedom, p represents the number of variables and n the sample size.

The centroid of the ellipse is at ȳ The half length of the semi-major axis is given by: λ1 p(n 1) n(n p) F p,n p(α) where λ 1 is the first eigenvalue of S The half length of the semi-minor axis is given by: λ2 p(n 1) n(n p) F p,n p(α) where λ 2 is the second eigenvalue of S The ratio of these two eigenvalues gives you some idea of the elongation of the ellipse

In addition to the (joint) confidence ellipse, it is possible to consider simultaneous confidence intervals - univariate confidence intervals based on a linear combination which could be considered as shadows of the confidence ellipse It is also possible to carry out Bonferroni adjustments of these simultaneous intervals

T 2 test is based upon Mahalanobis distance and can be used for inference on mean vectors - this test can be derived via a variety of routes Difference between univariate and multivariate inference, especially when considering confidence ellipses Having determined that there is a significant difference between mean vectors, you may wish to conduct a number of follow up investigations and even carry out discriminant analysis