Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing a class (What you actually study) Experimental unit individual research subject (e.g. location, entity, etc.) Response variable property of thing that you believe is the result of predictors (What you actually measure e.g. tree height) a.k.a dependent variable Predictor variable(s) environment of things which you believe is influencing a response variable (e.g. climate, soil attributes, topography, etc.) a.k.a independent variable Error - difference between an observed value (or calculated) value and its true (or expected) value

Experimental Unit (row) Data.I D 1 2 Type Varable 1 2 3 4 3 4 5 6 Regions Ecosystems Forest types Treatments, etc. In Multivariate statistics: Frequency of species Climate variables Soil characteristics Nutrient concentrations Contaminants, etc. s can be either numeric or categorical (depends on the technique) Focus is often placed on graphical representation of results

3 3, 6, 8 2 Rotation-based techniques Data.I D Type Varable 1 2 3 4 1 2 3 4 5 6 1 Find an equation to rotate data to so that axis explains multiple variables Final results based on multiple variables give different inferences than 2 variables Repeat rotation process to achieve analysis objective 1,2 1, 2, 4, 9, 10

Objectives of Rotation-based techniques 1. Rotate so that new axis explains the greatest amount of variation within the data Principal Component Analysis (PCA) Factor Analysis 2. Rotate so that the variation between groups is maximized Multivariate Analysis of Variance (MANOVA) Discriminant Analysis (DISCRIM) 3. Rotate so that one dataset explains the most variation in another dataset Canonical Correspondence Analysis (CCA)

Component 2 (y ) Math behind Rotation-based techniques Trigonometry y y x = 30 Original point (x,y) = (1,0) y = sin(α) = 0.5 x x = cos(α) = 0.8 Point after rotation (x,y ) = (0.8,0.5) Simple to understand But difficult/time consuming to do for more than 2 variables Component 1 (x )

Matrix Algebra Allows us to simultaneously rotate the data based on MANY variables Zero Matrix 0 = 0 0 0 A = (m x n matrix) a 11 a 12 a 1n a 21 a 22 a 2n 0 0 0 0 0 0 a m1 a m2 a mn If m=n than A is a square matrix If n=1 than A is a column vector If m=1 than A is a row vector Diagonal Matrix D = d 1 0 0 0 d 2 0 0 0 d n Transpose Matrix A = a 11 a 21 a m1 Identity Matrix I = 1 0 0 a 12 a 22 a m2 0 1 0 a 1n a 2n a nm 0 0 1

Matrix Algebra Allows us to simultaneously rotate the data based on MANY variables A + B = a 11 a 12 a 1n a 21 a 22 a 2n + b 11 b 12 b 1n b 21 b 22 b 2n = a 11 +b 11 a 12 +b 12 a 1n +b 1n a 21 +b 21 a 22 +b 22 a 2n +b 2n a m1 a m2 a mn b m1 b m2 b mn a m1 +b m1 a m2 +b m2 a mn +b mn A - B = a 11 a 12 a 1n a 21 a 22 a 2n - b 11 b 12 b 1n b 21 b 22 b 2n = a 11 -b 11 a 12 -b 12 a 1n -b 1n a 21 -b 21 a 22 -b 22 a 2n -b 2n a m1 a m2 a mn b m1 b m2 b mn a m1 -b m1 a m2 -b m2 a mn -b mn Addition & subtraction only works if matrices A and B are the same size

Matrix Algebra Allows us to simultaneously rotate the data based on MANY variables When multiplying matrices the number of columns in matrix A has to be equal to the number of rows in matrix B A * B = a 11 a 12 a 1n a 21 a 22 a 2n * b 11 b 12 b 1p b 21 b 22 b 2p = Ʃa 1j b j1 Ʃa 1j b j2 Ʃa 1j b jp Ʃa 2j b j1 Ʃa 2j b j2 Ʃa 2j b jp a m1 a m2 a mn b m1 b m2 b np Ʃa mj b j1 Ʃa mj b j2 Ʃa mj b jp Where j is the column number in A and the row number in B AND: Ʃ a ij b jk = a i1 b 1k + a i2 b 2k + + a in b nk You multiple the cells across from A and down from B The final output is a column of values of length n (number of columns in A and rows in B)

Matrix Algebra Allows us to simultaneously rotate the data based on MANY variables k*a = ka 11 ka 12 ka 1n ka 21 ka 22 ka 2n Multiplying by a scaler (coefficient) ka m1 ka m2 ka mn Calculating an inverse more than an 2D matrix is tedious to do by hand R will do it for you M -1 = a b c d = d/ -b/ -c/ a/ Where = (a * d)- (b * c)

Eigenvalues & Eigenvectors Consider the set of equations: a 11 x 1 + a 12 x 2 + + a 1n x n = λx 1 a 21 x 1 + a 22 x 2 + + a 2n x n = λx 2 a n1 x 1 + a n2 x 2 + + a nn x n = λx n Can also be written in matrix form as: Ax = λx or (A-λI)x = 0 where I is the n x n identity matrix And ) is column vector of 0 s Equations will only hold true for particular values of λ each value is an eigenvalue (up to n values) Solving the equations for a particular eigenvalue will require a set of values for x, where x = - this is an eigenvector x 1 X 2 x n