Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55

Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc iml; Chapter ). Today s Lecture Last Time Statistics of Vectors and Matrices (Beginning of Chapter 3, section 3.5). Lecture #3-9/7/005 Slide of 55

Last Time Today s Lecture Last Time Basics of matrix algebra: Matrices and matrix types (e.g., symmetric, diagonal, etc...). Vectors. Scalars. Basic matrix operations: Transpose. Addition/Subtraction. Matrix Multiplication. Scalar Multiplication. Basic matrix entities: Identity matrix. Zero matrix. Lecture #3-9/7/005 Slide 3 of 55

Vector Geometry Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections Recall that a vector can be thought of as a line (with a direction) emanating from the origin and terminating at a point. For instance, take the column vector: [ ] 3 x = 4 x can be displayed as: 5 4 3 1 x 1 3 4 5 Lecture #3-9/7/005 Slide 4 of 55

Scalar Multiplication Scalar multiplication of a vector results in changing the length of the vector: 8 7 6 5 4 3 1 x x 1 1 1 x 1 3 4 5 6 7 8 Lecture #3-9/7/005 Slide 5 of 55

Linear Combinations Vectors can be combined by adding multiples: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections y = a 1 x 1 + a x +... + a k x k The resulting vector, y, is called a linear combination. All for k vectors, the set of all possible linear combinations is called their span. Lecture #3-9/7/005 Slide 6 of 55

Linear Combinations Geometrically, the linear combination of a vector looks like: [ ] [ ] 3 x = y = 4 1 5 x+y 4 3 x 1 y 1 3 4 5 Lecture #3-9/7/005 Slide 7 of 55

Linear Dependencies A set of vectors are said to be linearly dependent if a 1, a,...,a k exist, and: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections a 1 x 1 + a x +... + a k x k = 0. a 1, a,...,a k are not all zero. Such linear dependencies occur when a linear combination is added to the vector set. Matrices comprised of a set of linearly dependent vectors are singular. A set of linearly independent vectors forms what is called a basis for the vector space. Any vector in the vector space can then be expressed as a linear combination of the basis vectors. Lecture #3-9/7/005 Slide 8 of 55

Example Basis Vectors The following two vectors form a basis for the cartesian coordinate system: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections u 1 = [ 1 0 ] u = All possible points on the graph can be represented by linear combinations of u 1 and u From our previous example: [ 0 1 ] x = 3u 1 + 4u y = u 1 + 1u This hints at a major part of multivariate analysis: vector and matrix decomposition. Lecture #3-9/7/005 Slide 9 of 55

Vector Length The length of a vector emanating from the origin is given by the Pythagorean formula: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections Lx = The length of x = 3 + 4 = 5 x 1 + x +... + x k = x x This is found by using forming a right triangle of scalar multiples of basis vectors: Lx 4 u 3 u 1 Lecture #3-9/7/005 Slide 10 of 55

Inner Product The inner (or dot) product of two vectors x and y is the sum of element-by-element multiplication: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections From our example: x y = x y = x 1 y 1 + x y +... + x k y k [ 3 4 ] [ 1 ] = 3 + 4 1 = 10 The inner product is used to compute the angle between vectors... Lecture #3-9/7/005 Slide 11 of 55

Vector Angle The angle formed between two vectors x and y is Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections 5 4 cos(θ) = x y x x y y If x y = 0, vectors x and y are perpendicular, as noted by x y 3 x 1 θ y 1 3 4 5 Lecture #3-9/7/005 Slide 1 of 55

Vector Angle From our example: cos(θ) = x y x x y y = 10 5 3 =.58 Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections θ = cos 1 (.58) = 6.6 o All basis vectors are perpendicular. For example: u 1u = 0 Lecture #3-9/7/005 Slide 13 of 55

Vector Projections The projection of a vector x onto a vector y is given by: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections x y y y y = x y L y y From our example the projection of x onto y is: x y y y y = 10 5 y = y = [ 4 ] Lecture #3-9/7/005 Slide 14 of 55

Vector Projections From our example the projection of x onto y is: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections x y y y y = 10 5 y = y = [ 4 ] 5 4 3 x Proj x,y 1 y 1 3 4 5 Lecture #3-9/7/005 Slide 15 of 55

Vector Projections From our example the projection of y onto x is: Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections x y y y y = 10 5 x = [ 1. 1.6 ] 5 4 3 x Proj y,x 1 y 1 3 4 5 Lecture #3-9/7/005 Slide 16 of 55

Vector Projections The length of vector projections is Lx cos(θ). Vector Geometry Scalar Multiplication Linear Combinations Linear Dependencies Vector Length Inner Product Angle Between Vectors Vector Projections Vector projections are the root of hypothesis tests for the general linear model (ANOVA and Multiple Regression). Through such projections, a set of linear independent vectors can be created from any set of vectors. One process used to create such vectors is through the Gram-Schmidt Process. Creating linearly independent vectors is useful in multivariate statistics (e.g., a work-around for collinearity). Lecture #3-9/7/005 Slide 17 of 55

Matrix Division : The Inverse Recall from basic math that: a b = 1 b a = b 1 a Division Singular Matrices And that: a a = 1 Matrix inverses are just like division in basic math. Lecture #3-9/7/005 Slide 18 of 55

The Inverse For a square matrix, an inverse matrix is simply the matrix that when pre-multiplied or post-multiplied with another matrix produces the identity matrix: Division Singular Matrices A 1 A = AA 1 = I Matrix inverse calculation is complicated and unnecessary since computers are much more efficient at finding inverses of matrices. One point of emphasis: just like in regular division, division by zero is undefined. By analogy - not all matrices can be inverted. Matrices that are comprised of linearly dependent column vectors cannot be inverted. Lecture #3-9/7/005 Slide 19 of 55

SAS Instead of learning how to invert a matrix from Supplement.1, let me teach you: Division Singular Matrices proc iml; reset print; a={3 6, 4 -}; a_inv = inv(a); quit; Lecture #3-9/7/005 Slide 0 of 55

Singular Matrices A matrix that cannot be inverted is called a singular matrix. Division Singular Matrices In statistics, common causes of singular matrices are found by linear dependence among the rows or columns of a square matrix. Linear dependence can be cause by combinations of variables, or by variables with extreme correlations (either near 1.00 or -1.00). For example: a={3 6, 9 18}; a_inv = inv(a); Gives: 8 a={3 6, 9 18}; 9 a_inv = inv(a); ERROR: (execution) Matrix should be non-singular. operation : INV at line 9 column 16 operands : A Lecture #3-9/7/005 Slide 1 of 55

Singular Matrices Division Singular Matrices Diagonal matrices have easy-to-compute inverses: a 11 0 0 A = 0 a 0 0 0 a 33 A 1 = 1 a 11 0 0 0 1 a 0 0 0 1 a 33 Lecture #3-9/7/005 Slide of 55

The following are some algebraic properties of matrices: (A + B) + C = A + (B + C) - Associative A + B = B + A - Commutative c(a + B) = ca + cb - Distributive (c + d)a = ca + da (A + B) = A + B (cd)a = c(da) (ca) = ca Lecture #3-9/7/005 Slide 3 of 55

The following are more algebraic properties of matrices: c(ab) = (ca)b A(BC) = (AB)C A(B + C) = AB + AC (B + C)A = BA + CA (AB) = B A For x j such that Ax j is defined: n n Ax j = A j=1 j=1 x j n (Ax j )(Ax j ) = A j=1 j=1 n x j x j A Lecture #3-9/7/005 Slide 4 of 55

Advanced Matrix Functions/Operations We end our matrix discussion with some advanced topics. All of these topics form the foundation of multivariate analyses. Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces Lecture #3-9/7/005 Slide 5 of 55

Matrix Orthogonality A square matrix (Q) is said to be orthogonal if: QQ = Q Q = I Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces Orthogonal matrices are characterized by two properties: 1. The product of all row vector multiples is the zero matrix (perpendicular vectors).. For each row vector, the sum of all elements is one. Lecture #3-9/7/005 Slide 6 of 55

Eigenvalues and Eigenvectors A square matrix can be decomposed into a set of eigenvalues and eigenvectors if: Ax = λx Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces From a statistical standpoint: Principal components are comprised of linear combination of a set of variables weighed by the eigenvectors. The eigenvalues represent the proportion of variance accounted for by specific principal components. Each principal component is orthogonal to the next, producing a set of uncorrelated variables that may be used for statistical purposes (such as in multiple regression). SAS Example #1... Lecture #3-9/7/005 Slide 7 of 55

Spectral Decompositions Imagine that a matrix A is of size k k. A then has: k eigenvalues: λ i, i = 1,...,k. k eigenvectors: e i, i = 1,...,k (each of size k 1. Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces A can be expressed by: A = k λ i e i e i i=1 This expression is called the Spectral Decomposition, where A is decomposed into k parts. SAS Example #... Lecture #3-9/7/005 Slide 8 of 55

Quadratic Forms A quadratic form of a matrix A is given by: Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces x Ax A symmetric matrix A (of size k k) is said to be positive definite if, for all x 0: 0 x Ax This implies: All eigenvalues of A are greater than zero. The determinant of A is greater than zero (to be discussed). The trace of A is greater than zero (to be discussed). A is non-singular (invertible). Quadratic forms will be discussed in more detail when we introduce the concept of statistical distance. Lecture #3-9/7/005 Slide 9 of 55

Square Root Matrices The square root matrix of a positive definite matrix A can be formed by: Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces A 1/ = k λi e i e i = Pλ 1/ P i=1 The square root matrix has the following properties: (A 1/ ) = A 1/ ) - A 1/ is symmetric. A 1/ A 1/ = A. A 1/ = k i=1 1 λi e i e i = Pλ 1/ P A 1/ A 1/ = A 1/ A 1/ = I A 1/ A 1/ = A 1 Lecture #3-9/7/005 Slide 30 of 55

Matrix Determinants A square matrix can be characterized by a scalar value called a determinant. Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces det A = A Much like the matrix inverse, calculation of the determinant is very complicated and tedious, and is best left to computers. What can be learned from determinants is if a matrix is singular. Matrices with determinants that are greater than zero are said to be positive definite, a byproduct of which is that a positive matrix is non-singular. A = k i=1 λ i SAS example #3... Lecture #3-9/7/005 Slide 31 of 55

Matrix Traces For a square matrix A (size k k), the trace of a matrix is defined as the sum of the diagonal elements: Orthogonality Eigenspaces Decompositions Quadratic Forms Square Root Matrices Determinants Traces SAS example #4... tr(a) = k a ii = i=1 k i=1 λ ii Lecture #3-9/7/005 Slide 3 of 55

Random Variables Random Variable is a variable whose outcome depends on the result of a chance experiment. Can be continuous or discrete (most of our statistics deal with continuous variables). Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) Has a density function f(x) that indicates relative frequency of occurrence. So we will discuss the basic ways that are used to summarize the random variable x. Lecture #3-9/7/005 Slide 33 of 55

Mean Mean is a measure of central tendency. Expectation (E(x)) - describes the population mean. E(x) = µ x Mean ( x) is the sample mean. Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) Will never equal the population mean. x = 1 n n 1). n i=1 x i = 1 x 1 (1 is a column vector of ones, size n The sampling distribution of x has mean of µ x and variance σ x n, and, as n increases, the distribution converges to a normal distribution. Lecture #3-9/7/005 Slide 34 of 55

Variance Variance is a measure of spread/range. Population Variance (σ x) - describes the population variance E((x µ x ) ) = σ x Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) Variance (s x) is the sample variance. Will never equal the population variance s x = 1 n (x i x) = 1 n n (x x1) (x x1). i=1 Lecture #3-9/7/005 Slide 35 of 55

Linear Combination of x Lets looks at what happens to mean if compute a new variable ax Mean is a measure of central tendency. E(ax) = ae(x) Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) ax = a x Variance is a measure of spread/range. E((ax aµ x ) ) = a σ x Var(ax) = (a s x) is the sample variance. Lecture #3-9/7/005 Slide 36 of 55

Covariance of x and y If x and y are measured on each unit (e.g., person) we are said to have a bivariate random variable (x, y). Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) Expectation E(x + y) = E(x) + E(y) E(xy) = E(x)E(y) ONLY IF x and y are Independent It is possible that x and y have some relationship and so they will tend to covary. Height and weight You may live closer the longer you have been going to school here. Lecture #3-9/7/005 Slide 37 of 55

Cov(x, y) Population Covariance (σ xy ) σ xy = E[(x µ x )(y µ y )] = E(xy) µ x µ y. E(xy) = E(x)E(y) ONLY IF X and Y are Independent. The population covariance (σ xy ) equals zero if x and y are independent (i.e., orthogonal). Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) The sample covariance (s xy ) n i=1 s xy = (x i x)(y i ȳ) n Sample variance is never equal to the population variance. Lecture #3-9/7/005 Slide 38 of 55

Corr(x, y) One problem with the covariance is that it is dependent on scale. Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) By multiplying the variable by a constant the covariance will change. We standardize covariance by computing the correlation. Correlation is the same when variable are transformed using a linear transformation. Correlation is the same as the covariance WHEN x and y are standardized (i.e., have variance 1). Lecture #3-9/7/005 Slide 39 of 55

Corr(x, y) Population: Corr(x, y) = Corr(x, y) = ρ xy = σ xy σ x σ y E[(x µ x )(y µ y )] E[(x µx ) ] E[(y µ y ) ] Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) Sample: Corr(x, y) = r xy = s xy s x s y [(xi µ x )(y i µ y )] Corr(x, y) = [(xi x) ] [(yi ȳ) ] Lecture #3-9/7/005 Slide 40 of 55

Corr(x, y) Also, Correlation is related to the angle (θ) between two normalized vectors a and b, where cos(θ) = r xy = a a + b b (b a) (b a) (a a)(b b) Random Variables Mean Variance Linear Combination of x Covariance of x and y Cov(x, y) Corr(x, y) Lecture #3-9/7/005 Slide 41 of 55

More Dimensions Now we are going to start generalizing the univariate concepts to multivariate data. We begin by defining multivariate data as a vector of p observations that have been taken from a single entity. We will use x to indicate a (p x 1) vector from a single entity. More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix We will use X to indicate an (n x p) matrix containing n entities, each with measurements of p variables. Lecture #3-9/7/005 Slide 4 of 55

More Dimensions X = x 11 x 1 x 1p x 1 x x p.... x n1 x n x np X = x 1 x. x n More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Lecture #3-9/7/005 Slide 43 of 55

SAS Example #5 Using some multivariate statistical methods I have yet to discuss, I estimated a set of seven different computer abilities for 3000 students in a school district. More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Lecture #3-9/7/005 Slide 44 of 55

Mean Mean is a measure of central tendency. More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Expectation (E(X)) - describes the vector of population means E(x 1 ) µ 1 E(x ) E(x) =. E(x p ) = µ.. µ p Lecture #3-9/7/005 Slide 45 of 55

Mean Mean ( x) is the sample mean vector. Will never equal the population mean vector. x = 1 n n x i = 1 n X 1. i=1 More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix x = x 1 x.. x p Lecture #3-9/7/005 Slide 46 of 55

Variance/Covariance Originally when there was a single variable we used variance Once we had two variables we could also describe the way the two variables covaried More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Using this same idea if there are several variables to fully capture the variability of our data we must describe both: variability of a variable covariance of a variable with all other variables These are stored and reported in a variance/covariance matrix Lecture #3-9/7/005 Slide 47 of 55

Variance/Covariance Population Σ = σ 11 σ 1 σ 1p σ 1 σ σ p.... σ p1 σ p σ pp More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Sample S = s 11 s 1 s 1p s 1 s s p.... s p1 s p s pp Lecture #3-9/7/005 Slide 48 of 55

Variance Variance is a measure of spread/range. Population Variance (Σ) - describes the population variance E((x µ x )(x µ x ) ) = Σ Variance (S) is the sample variance. More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Will never equal the population variance S = 1 n 1 n (x i x) = 1 n 1 (X 1 x ) (X 1 x ). i=1 Lecture #3-9/7/005 Slide 49 of 55

Correlation Matrix Problem with covariance is that it can be difficult to interpret So just like in the bivariate case we can compute the correlation Recall that correlation ρ xy = σ xy σ x σ y More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Lecture #3-9/7/005 Slide 50 of 55

Variance/Covariance So by substituting in, our correlation matrix should look like Population More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Σ = σ 11 σ 1 σ 1 σ 1 σ 1 σ σ 1 σ σ σ 1 σ σ. σ p1 σ p σ 1 Notice the pattern?. σ p. σ p σ σ 1p σ 1 σ p σ p σ σ p. σ pp σ p σ p It looks like something that we saw using diagonal matrices Lecture #3-9/7/005 Slide 51 of 55

Correlation Define a diagonal matrix with the inverse of the standard deviations (D 1 ) as: More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Population Σ = 1 0 0 σ 1 1 0 0 σ.... 0 0 1 σ p Lecture #3-9/7/005 Slide 5 of 55

Correlation Then we can define our correlation matrix as: Population P = D 1 ΣD 1 While all of this has been phrased as population we can also compute the sample correlation R = D 1 SD 1 More Dimensions SAS Example #5 Mean Variance/Covariance Variance Correlation Matrix Lecture #3-9/7/005 Slide 53 of 55

Final Thought Final Thought Next Class Matrix algebra makes the technical things in life easier. The applications of matrices will be demonstrated throughout the rest of this course. Virtually all of statistics can be expressed with matrices. Once you learn to read and think in matrices, statistics becomes much easier. Lecture #3-9/7/005 Slide 54 of 55

Next Time Geometric implications of multivariate descriptive statistics (Sections of Chapter 3). The multivariate normal distribution (Chapter 4). Final Thought Next Class Lecture #3-9/7/005 Slide 55 of 55