Multivariate Statistical Analysis

Similar documents
Multivariate Statistical Analysis

Multivariate Statistical Analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

1. Introduction to Multivariate Analysis

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

An Introduction to Multivariate Methods

Vectors and Matrices Statistics with Vectors and Matrices

Covariance and Correlation

Multivariate Statistical Analysis

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Dr. Allen Back. Sep. 23, 2016

Mathematics for Graphics and Vision

CSC 411: Lecture 09: Naive Bayes

MULTIVARIATE ANALYSIS OF VARIANCE

18 Bivariate normal distribution I

STAT 730 Chapter 1 Background

Chapter 7, continued: MANOVA

22A-2 SUMMER 2014 LECTURE Agenda

Classification 2: Linear discriminant analysis (continued); logistic regression

Weighted Least Squares

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

Statistics Introductory Correlation

Principal Component Analysis (PCA) Theory, Practice, and Examples

Lecture 16 Solving GLMs via IRWLS

Multivariate Analysis

An Introduction to Multivariate Statistical Analysis

7. The Multivariate Normal Distribution

VECTOR ALGEBRA. 3. write a linear vector in the direction of the sum of the vector a = 2i + 2j 5k and

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Statistical Analysis

Introduction to multivariate analysis Outline

Multiple Linear Regression

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

CMSC858P Supervised Learning Methods

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Key Algebraic Results in Linear Regression

Principal component analysis

2. Matrix Algebra and Random Vectors

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Describing Contingency tables

Principal component analysis

Course 2BA1: Hilary Term 2007 Section 8: Quaternions and Rotations

Stat 217 Final Exam. Name: May 1, 2002

Classification 1: Linear regression of indicators, linear discriminant analysis

INFORMATION THEORY AND STATISTICS

Basic Linear Algebra in MATLAB

,..., θ(2),..., θ(n)

Machine Learning (CS 567) Lecture 5

Linear Algebra Review

Multivariate Statistical Analysis

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Overlapping Variable Clustering with Statistical Guarantees and LOVE

Consistent Bivariate Distribution

Lecture 3: Linear Algebra Review, Part II

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Hypothesis Testing hypothesis testing approach

Based on slides by Richard Zemel

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Grouping of correlated feature vectors using treelets

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

Stat 8931 (Aster Models) Lecture Slides Deck 8

Machine Learning, Fall 2012 Homework 2

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

Machine Learning 2nd Edition

Bivariate Relationships Between Variables

MTH 2032 SemesterII

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Unsupervised Learning with Permuted Data

Multivariate Statistical Analysis

Multivariate Regression (Chapter 10)

An introduction to multivariate data

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

f rot (Hz) L x (max)(erg s 1 )

Matrices and Deformation

Unsupervised Learning: Dimensionality Reduction

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Singular Value Decomposition and Principal Component Analysis (PCA) I

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Lecture 7 Spectral methods

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Algebra Workshops 10 and 11

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

BSc (Hons) in Computer Games Development. vi Calculate the components a, b and c of a non-zero vector that is orthogonal to

Multivariate Distributions (Hogg Chapter Two)

Span and Linear Independence

Basic Concepts in Matrix Algebra

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

Directional Control Schemes for Multivariate Categorical Processes

EXTENDING PARTIAL LEAST SQUARES REGRESSION

Lecture 9 SLR in Matrix Form

Transcription:

Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 3 for Applied Multivariate Analysis

Outline 1 Reprise-Vectors, vector lengths and the angle between them 2 3 Partial correlation coefficient 4 5

Recall r xy = (x xj) (y yj) [(x xj) (x xj)][(y yj) (y yj)] Thus if the angle θ between the two centered vectors centered as x - xj and y - yj is small so that cos θ is near 1, r xy will be close to 1. If the two vectors are perpendicular, cos θ and r xy will be zero. If the two vectors have nearly opposite directions, r xy will be close to -1.

But if you now attempt to plot x and y, i.e. the three dimensional points (1,4,7) and (2,5,6) you could attempt to measure the angle between these vectors. The cosine of this angle is the correlation.

Figure: Test vector 2-points in 3-dimensions (2,5,6) 7 (1,4,7) 6 5 4 3 2 1 0 1 5 4 3 2 1 0 0 0.5 (1,1, 1) 1 1.5 2

Unfortunately, it s a little harder to even imagine this vector for a conventional data set (with tens if not hundreds of points), but that s what you ve been measuring whenever you work out the correlation coefficient. And if you think about the correlation paradox, you will appreciate that by having two modestly correlated variables (i.e. with angles in the order of 50 or more degrees), when you measure the angle between the two outermost variables it will be greater than 90 and the cosine will be negative.

cosθ = a a + b b (b a) (b a) 2 (a a) (b b) = a a + b b (b b + a a 2a b) 2 (a a)(b b) a b = (a a)(b b) So for vectors say a = b = 1 4 7 2 5 6 and

...we have length of a(l a ) = a a = 1 2 + 4 2 + 7 2 = 66 = 8.12404 and the length of b(l b ) = b b = 2 2 + 5 2 + 6 2 = 65 = 8.06226 and (a b) = 1 2 + 4 5 + 7 6 = 64 so that cosθ = = a b (a a)(b b) 64 66 65 = 64 8.12404 8.06226 = 0.977120. so we can determine the arccosine to determine θ = 0.2149 radians which is 12.3129.

Linear Dependence A pair of vectors a and b of the same dimension is linearly dependent if there exists constants c 1 and c 2 both not zero such that c 1 a + c 2 b = 0 A set of k vectors a, b,..., w of the same dimension are linearly dependent if there exists constants c 1,c 2...,c k not all zero such that c 1 a + c 2 b + + c k w = 0 Linear dependence implies that at least one of the vectors in the set can be written as a linear combination of the other vectors. Obviously, vectors of the same dimension that are not linearly dependent are linearly independent.

Example of linearly independent vectors Suppose we have the following three vectors 1 y 1 = 2 1 1 y 2 = 0 y 3 = 1 1 2 1

So so that c 1 y 1 + c 2 y 2 + c 3 y 3 = 0 c 1 + c 2 + c 3 = 0 2c 1 + 2c 3 = 0 c 1 c 2 + c 3 = 0 with the unique solution c 1 = c 2 = c 3 = 0. So the vectors are linearly independent.

Y1932 Y1936 Y1940 Y1960 Y1964 Y1968 Missouri 35 38 48 50 36 45 Maryland 36 37 41 46 35 42 Kentucky 40 40 42 54 36 44 Louisiana 7 11 14 29 57 23 Mississippi 4 3 4 25 87 14 "South Carolina" 2 1 449 59 39 >votes.data<-read.table("...\\votes.dat",header=t) >library(aplpack) >faces(votes.data)

Partial correlation coefficient You should be very comfortable dealing with bivariate concepts. Now let s introduce some terminology that we will be using throughout in a fully multivariate setting. Firstly, note that conventionally, we have p variables y 1,y 2,...,y p observed on n individuals p variable means can be collected into the mean vector ȳ T = (ȳ 1,ȳ 2,...,ȳ p ) Variances of the p variables collected on the diagonal and the 1 2 p(p 1) covariances between every pair of variables collected in off-diagonal of the variance covariance matrix S. Correlations between every pair of variables can be put in the off-diagonal position of the correlation matrix R (diagonal elements are all 1) Mean centering or standardising is conducted by carrying out the appropriate operation on each variable in turn. Covariance matrix of the standardised data is the same of the correlation Instructor: matrix C. L. Williams, of theph.d. original MthSc data. 807

Variance-Covariance Structures Partial correlation coefficient We no longer have bivariate correlation coefficients (or covariance), we now have a Covariance (or even more formally the variance covariance) matrix. S = s 1 0 0 0 0 s 2 0 0.... 0 s j2 s j 0.... 0 0 0 s p

Partial correlation coefficient S = s 1 s 12 s 1j s 1p s 21 s 2 s 2j s 2p.... s j1 s j2 s j s jp.... s p1 s p2 s pj s p

Partial correlation coefficient Σ = E[(y µ) (y µ)] Σ = σ 11 σ 12 σ 1j σ 1p σ 21 σ 22 σ 2j σ 2p.... σ j1 σ j2 σ jj σ jp.... σ p1 σ p2 σ pj σ pp

Outline Partial correlation coefficient 1 Reprise-Vectors, vector lengths and the angle between them 2 3 Partial correlation coefficient 4 5

Partial correlation coefficient We also have more subtle ways of measuring the association between variables For quantitative data, the correlation matrix R shows all pairwise correlations between variables (may be too many to interpret) The partial correlation r ij,k is the correlation between x i and x j when x k is held at a constant value. Any number of values can be held fixed: r ij r ik r jk r ij,k = (1 r 2 ik )(1 r2 jk ) r ij,k r il,k r jl,k r ij,kl =, etc. (1 ril,k 2 )(1 r2 jl,k ) A partial correlation will show whether a high correlation between two variables is caused by their mutual correlation with one of more other variables

R = D 1 s SD 1 s Instructor: C. L. Williams, S Ph.D. = D RD MthSc 807 Reprise-Vectors, vector lengths and the angle between them Correlation Matrices Relationship to Covariance Matrices Partial correlation coefficient r jk = s jk s jj s kk R = (r jk ) = 1 r 12 r 1j r 1p r 21 1 r 2j r 2p.... r j1 r j2 1 r jp.... r p1 r p2 r pj 1

Population correlation matrix Partial correlation coefficient ρ ρ = (ρ jk ) = 1 ρ 12 ρ 1j ρ 1p ρ 21 1 ρ 2j ρ 2p.... ρ j1 ρ j2 1 ρ jp.... ρ p1 ρ p2 ρ pj 1 where ρ jk = σ jk σ j σ k

Multivariate statistics covers a vast and rapidly expanding discipline; developments are taking place for the analysis of gene expression data as well as psychometrics to name but two. We therefore need to narrow down our definition to make life manageable. For the purposes of this course, our definition of multivariate is as follows: We don t count multiple regression as a multivariate technique. Relationships between variables (principal components, factor analysis, canonical correlation) Relationships between individuals (cluster analysis, discriminant analysis, Hotelling s T 2 test, MANOVA, principal coordinates analysis.

There are some overlaps between the techniques. Also, to make life manageable, most of the techniques we will consider involve eigenanalysis (either of the covariance or the correlation matrix, or of the ratio of between and within covariance matrices).

Firstly, the techniques we shall cover looking at the differences between individuals are as follows: Differences between Groups Can we tell if a vector of means ȳ 1 is different from another vector ȳ 2 (Hotelling s T 2 ) What if we have more than two groups (MANOVA)? Classification Can we find individuals that are more alike than other individuals (Cluster Analysis) If we already knew the groupings, could we investigate which variables were most important in telling the groups apart. Could we use this information to find a rule that lets us classify new observations (Discriminant analysis)

Visualisation Do we have some way of visualising the similiarities and dissimilarities between individuals (Scaling / Principal Co-ordinates Analysis)

And the techniques which are about examining the variables are as follows: Dimension Reduction Can we represent our data in less dimensions (Principal Components Analysis) Relationships between variables Can we model the relationships between variables (Factor analysis) If we have a set of variables X, can we find a projection that is correlated to a projection of variables Y (Canonical correlation)

Linear combinations z 1i = α 11 y 1i + α 12 y 2i +... + α 1p y pi z 2i = α 21 y 1i + α 22 y 2i +... + α 2p y pi. =. z pi = α p1 y 1i + α p2 y 2i +... + α pp y pi Many techniques are concerned with finding out useful linear combinations for a given task!