STAT 730 Chapter 1 Background

Similar documents
An Introduction to Multivariate Methods

1. Introduction to Multivariate Analysis

STAT 730 Chapter 5: Hypothesis Testing

Data Mining and Analysis

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to Machine Learning

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Table of Contents. Multivariate methods. Introduction II. Introduction I

Outline Week 1 PCA Challenge. Introduction. Multivariate Statistical Analysis. Hung Chen

A Peak to the World of Multivariate Statistical Analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

5. Discriminant analysis

Vectors and Matrices Statistics with Vectors and Matrices

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

This appendix provides a very basic introduction to linear algebra concepts.

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Multivariate analysis of variance and covariance

Classification Methods II: Linear and Quadratic Discrimminant Analysis

EECS490: Digital Image Processing. Lecture #26

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

CS 6375 Machine Learning

STAT 730 Chapter 14: Multidimensional scaling

Motivating the Covariance Matrix

Machine Learning (CS 567) Lecture 5

Techniques and Applications of Multivariate Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Multivariate Statistical Analysis

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Stat 206: Sampling theory, sample moments, mahalanobis

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Chapter 7, continued: MANOVA

Lecture 6: Methods for high-dimensional problems

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Review problems for MA 54, Fall 2004.

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

The Multivariate Gaussian Distribution

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Linear Models Review

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Extensions to LDA and multinomial regression

The Principal Component Analysis

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

From dummy regression to prior probabilities in PLS-DA

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Principal component analysis

Review of Linear Algebra

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Course topics (tentative) The role of random effects

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Data Mining and Analysis: Fundamental Concepts and Algorithms

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

Course in Data Science

Discriminant analysis and supervised classification

Eigenvalues and diagonalization

STAT 730 Chapter 4: Estimation

Canonical Correlation Analysis of Longitudinal Data

Lecture 8: Classification

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate Analysis

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

MLE/MAP + Naïve Bayes

CS534 Machine Learning - Spring Final Exam

Experimental Design and Data Analysis for Biologists

2. Matrix Algebra and Random Vectors

Introduction to Matrix Algebra

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning (BSMC-GA 4439) Wenke Liu

Statistical Pattern Recognition

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

Linear Algebra Review. Vectors

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

14 Singular Value Decomposition

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

ANOVA: Analysis of Variance - Part I

Multivariate Statistical Analysis

Rejection regions for the bivariate case

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Textbook: Methods of Multivariate Analysis 2nd edition, by Alvin C. Rencher

Dimensionality Reduction and Principal Components

15 Singular Value Decomposition

Introduction to Machine Learning

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

Hands-on Matrix Algebra Using R

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

STAT5044: Regression and Anova. Inyoung Kim

Dimensionality reduction

Principal Components Analysis. Sargur Srihari University at Buffalo

Background Mathematics (2/2) 1. David Barber

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Transcription:

STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27

Logistics Course notes hopefully posted evening before lecture, based on Mardia, Kent, and Bibby. We will cover Chapters 1 14, skipping 7, in varying amounts of detail, spending about one week on each chapter. If time permits we will discuss additional topics, e.g. models specific to spatiotemporal data; decison trees; ensemble methods such as boosting, bagging, & random forests; Bayesian approaches. Your course grade will be computed based on graded homework problems. 2 / 27

Multivariate data Multivariate data abounds. Often we have many variables recorded on each experimental subject. (f i, q i ): beer drinking frequency and drinking quantity on 4-point ordinal scale for New Mexican DUI offenders along with abuse history, age, and gender x i. Temperature T ijkt ( o C) at different locations and depths of pig heads in MRI coil over time. Essentially functional data. MRI (T 1ρ,i, T 2ρ,i ) measurements on Parkinson s and non-parkinson s patients, along with race, gender, and age x i. Survival times T ij from diagnosis of prostrate cancer patients by county for South Carolinian men, along with marriage status, stage, grade, race, and age x ij. Fisher s iris data. Dental data from STAT 771. 3 / 27

Fisher s iris data Fisher (1936) introduced these data which consist of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four measurements taken on each flower: the length and the width of the sepals and petals (cm). R code to obtain scatterplot matrix: data(iris) iris # examine data plot(iris[,1:4],pch=c(rep(1,50),rep(2,50),rep(3,50))) 4 / 27

Dental data Pothoff and Roy (1964) present data on n = 27 children, 16 boys and 11 girls. On each child, the distance (mm) from the center of the pituitary to the pterygomaxillary fissure was made at ages 8, 10, 12, and 14 years of age. R code to obtain spaghetti plot: library(heavy) # data are in this package data(dental) # attach data dental # look at data library(ggplot2) # used to make profile plot ggplot(data=dental,aes(x=age,y=distance,group=subject,color=sex))+geom_line() Each connected curve represents one child. Note boys typically larger than girls at each time point, but not perfect. How to model these data? How to test whether boys and girls are different? 5 / 27

Related topics/courses We will study the theory of classical multivariate analysis in STAT 730, most of it based on a multivariate normal assumption. Classical techniques widely used. Courses related to STAT 730: STAT 517 & STAT 704, flexible regression models. STAT 771 Longitudinal Data Analysis (multivariate regression, MANOVA, non-normal extenstions) STAT 718 Statistical Learning (shrinkage priors in regression, dimension reduction, classification, smoothing, model selection and averaging, boosting, SVM & discrimination, clustering, random forests) CSCE 822 Data Mining and Warehousing (classification, clustering, dimension reduction, feature selection) CSCE 883 Machine Learning (dimension reduction, clustering, decision trees, discrimination, SVM, HMM) CSCE 768 Pattern Recognition (discrimination, neural networks, genetic algorithms, clustering, dimension reduction, HMM) PSYC 821 Theory of Psychological Measurement (scaling and factor theory, item analysis). PSYC 823 Multivariate Analysis of Behavioral Data (multiple linear regression, canonical correlation, discriminant functions, multidimensional scaling) PSYC 824 Special Topics in Quantitative Psychology (longitudinal data analysis, multilevel modeling, structural equation modeling). Also closely related are the fields of Functional Data Analysis, where the data are curves or surfaces, and Shape Analysis. 6 / 27

Matrix review Knowledge of elemetary linear algebra is assumed. You may want to review. A.2 Matrix operations A.3 Types of matrices A.4 Vector spaces, rank, linear equations A.5 Linear transformations A.6 Eigenvalues and eigenvectors, spectral decomposition A.7 Quadratic forms A.8 A.10 Generalized inverse, matrix differentiation, maximization, geometric ideas Especially important: spectral decomposition of symmetric Σ > 0 (p. 469). Also related: SVD. 7 / 27

Data matrix To start, we have n objects (if human, subjects), each having p variables measured on each. Let x ij be the jth variable measured on the ith object. The n p data matrix is x 11 x 12 x 1p x 21 x 22 x 2p X =....... x n1 x n2 x np Each row has all p variables on an object and each column has all measurements on one variable across all objects. 8 / 27

Data vector The ith object has the data vector x i1 x i2 x i =.. Then we can write X as x ip X = [x 1 x 2 x n ] = x 1 x 2. x n. 9 / 27

Measurement vector All n measurements of the the jth variable is x 1j x 2j x (j) =.. Then we can write X as x nj X = [x (1) x (2) x (p) ]. 10 / 27

Univariate summary statistics Let 1 a = (1, 1,..., 1) be an a 1 vector of all 1 s. The sample mean of the jth measurement is x j = 1 n n x ij = 1 n 1 nx (j). i=1 The p sample means are collected into the sample mean vector x 1 x =. = 1 n (1 nx) = 1 n X 1 n. x p The sample variance of the jth measurement is s jj = 1 n n (x ij x j ) 2 = 1 n (x (j) 1 n x j ) (x (j) 1 n x j ) = 1 n x (j) 1 n x j 2. i=1 11 / 27

Covariance & correlation The sample covariance of the ith and jth measurements is s ij = 1 n n (x ri x i )(x rj x j ) = 1 n (x (i) 1 n x i ) (x (j) 1 n x j ). r=1 The sample correlation between measurements i and j is r ij = s ij sii s jj. The sample covariance matrix is s 11 s 12 s 1p s 21 s 22 s 2p S =...... s p1 s p2 s pp. 12 / 27

Clever ways of writing S Note that the ijth element of ns is (x (i) 1 n x i ) (x (j) 1 n x j ). Thus ns = (x (1) 1 n x 1 ). (x (p) 1 n x p ) p n [ x(1) 1 n x 1 x (p) 1 n x p ] n p. One matrix is the transpose of the other. Verify that the matrix on the right is (I n 1 n 1 n1 n)x where I n is the n n identity matrix. Therefore ns = [(I n 1 n 1 n1 n)x] [(I n 1 n 1 n1 n)x] = X (I n 1 n 1 n1 n)x def = X HX, as (I n 1 n 1 n1 n) is idempotent (verify this). So S = 1 n X HX. 13 / 27

Clever ways of writing S Note that the ijth element of the p p matrix (x r x)(x r x) is (x ri x i )(x rj x j ). Verify that ns = = n (x r x)(x r x) r=1 n x r x r n x x. r=1 14 / 27

Correlation and covariance matrices Let R = [r ij ] by the p p correlation matrix. Let D = diag( s 11,..., s pp ). Then R = D 1 SD 1 and S = DRD. Recall that the inverse of a diagonal matrix simply takes the reciprocal of each entry. 15 / 27

Two measures of multivariate scatter S measures scatter about the mean, as well as pairwise association among variables about the mean. Two univariate measures of scatter derived from S are The generalized variance S = R p j=1 s2 j. The total variation tr(s) = p j=1 s2 j. The larger either is, the more the values of x i are scattered about x. 16 / 27

Cork data Rao (1948) gives the weight of cork deposits in centigrams for n = 28 trees on the north, south, east, and west sides of each tree. The following R code reads the data and produces the sample mean vector, covariance matrix, and correlation matrix. d=read.table("http://www.stat.sc.edu/~hansont/stat730/cork.txt",header=t) d # show the data matrix mean(d) # sample mean cov(d) # a bit off from what s in your book S=cov(d)*27/28 # book uses n instead on (n-1) in denominator cor(d) # correlation matrix det(s) # generalized variance sum(diag(s)) # total variation 17 / 27

Linear combinations Let a R p. A linear combination of the x i is given by Then y i = a x i = a 1 x i1 + a p x ip. ȳ = 1 n n a x i = a x. i=1 Note that (y i ȳ) 2 = a (x i x) (x }{{} i x) a = a [(x }{{} i x)(x i x) ]a. R R Then s 2 y = 1 n n (y i ȳ) 2 = 1 n i=1 n a (x i x)(x i x) a = a Sa. i=1 18 / 27

A more general transformation Now consider y i = Ax i + b where A R q p and b R p. Then ȳ = A x + b and S y = 1 n n (y i ȳ)(y i ȳ) = ASA. i=1 You should show this. Finally, if A R p p and A 0 then S = A 1 S y (A ) 1. 19 / 27

Standardizing and Mahalanobis transformations Each measurement can be standardized via y i = D 1 (x i x), D = diag(s 1,..., s p ), where s j = s jj is the std. deviation of {x 1j,..., x nj }. Each standardized measurement in {y 1j,..., y nj } has mean zero and variance one. Note that S y = R. If S > 0 (i.e. S 0) then S 1 has a unique pos. def. symmetric square root S 1/2. What is it? Hint: use the spectral decomposition. The Mahalanobis transformation is y i = S 1/2 (x i x). Note that S y = I; the transformed measurements are uncorrelated. 20 / 27

Principle components transformation The spectral decomposition of S is S = GLG, where G has (orthonormal) eigenvectors of S as columns and the diagonal matrix L = diag(l 1,..., l p ) has the corresponding eigenvalues of S. The principle components rotation is y i = G (x i x). Again, the transformed measurements are uncorrelated as S y = G SG = L. Note that the two measures of multivariate scatter are then S = p i=1 l i and tr(s) = p i=1 l i. 21 / 27

Geometry Comparing columns of X, i.e. variables, use R-techniques: principle components analysis, factor analysis, canonical correlation analysis. Comparing rows of X, i.e. objects, use Q-techniques: discriminant analysis, clustering, multidimensional scaling. 22 / 27

Example from R-space Consider Y = HX, the centered data matrix, i.e. each variable is shifted to be mean-zero. Then the cosine of the angle θ ij between y (i) and y (j) is cos θ ij = y (i) y (j) y (i) y (j) = s ij s i s j = r ij. Note than when cos θ ij = 0 the vectors are orthogonal, when cos θ ij = 1 they coincide. 23 / 27

Example from Q-space The n rows of X are points in R p. The Euclidean distance between x i and x j is x i x j 2 = (x i x j ) (x i x j ). A distance that takes correlation into account is the Mahalanobis distance D ij = (x i x j ) S 1 (x i x j ). Mahalanobis distance underlies the Hotelling T 2 test and discriminant analysis; it is also used for comparing populations with different means but similar covariance. 24 / 27

Visualizing multivariate data Univariate displays of each variable: histograms, dotplots, boxplots, etc. Do not show association. Scatterplot matrix. Shows pairwise association. Parallel coordinate plots, or profile plots, or spaghetti plots. Harmonic curves. Chernoff faces, star plots, others. See examples on course webpage. 25 / 27

Profile plots & harmonic curves Profile plot: Each measurement variable is a point on the x-axis. A connected line represents one row of X. Can color-code the lines to show different groups. Harmonic curve: each measurement is a coefficient in a harmonic expansion f xi (t) = x i1 2 + x i2 sin t + x i3 cos t + x i4 sin 2t + x i5 cos 2t + In either case, similar curves implies similar sets of measurements. 26 / 27

Visualize Iris data Scatterplot matrices: data(iris) plot(iris[,1:4],pch=c(rep(1,50),rep(2,50),rep(3,50))) library(lattice) splom(~iris[1:4],groups=species,data=iris) Harmonic curves: library(andrews) data(iris) andrews(iris,type=1,clr=5,ymax=3) Profile plots: library(reshape) # to turn data matrix into list library(ggplot2) # easier to use than lattice directly iris$id=1:150 # add id variables for each iris iris2=melt(iris,id.var=5:6,measure.vars=1:4) # one measurement per row ggplot(data=iris2,aes(x=variable,y=value,group=id,color=species))+geom_line() 27 / 27