Basic Concepts in Matrix Algebra

Similar documents
2. Matrix Algebra and Random Vectors

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Introduction to Matrix Algebra

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra

Knowledge Discovery and Data Mining 1 (VO) ( )

Vectors and Matrices Statistics with Vectors and Matrices

CS 246 Review of Linear Algebra 01/17/19

TBP MATH33A Review Sheet. November 24, 2018

Linear Algebra (Review) Volker Tresp 2018

Chapter 1. Matrix Algebra

Lecture 3: Matrix and Matrix Operations

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

. a m1 a mn. a 1 a 2 a = a n

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra V = T = ( 4 3 ).

Linear algebra for computational statistics

A Little Necessary Matrix Algebra for Doctoral Studies in Business & Economics. Matrix Algebra

ELE/MCE 503 Linear Algebra Facts Fall 2018

Math Linear Algebra Final Exam Review Sheet

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

4 Linear Algebra Review

Matrix Algebra: Summary

Mathematical Foundations of Applied Statistics: Matrix Algebra

Linear Algebra and Matrices

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Matrices. Chapter Definitions and Notations

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Notes on Linear Algebra and Matrix Theory

The Multivariate Normal Distribution. In this case according to our theorem

Principal Components Theory Notes

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Lecture 2: Linear Algebra Review

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Basic Elements of Linear Algebra

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra

Linear Algebra Review. Vectors

Linear Algebra Massoud Malek

Multivariate Statistical Analysis

Review (Probability & Linear Algebra)

Review problems for MA 54, Fall 2004.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

STAT200C: Review of Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Introduction Eigen Values and Eigen Vectors An Application Matrix Calculus Optimal Portfolio. Portfolios. Christopher Ting.

EE731 Lecture Notes: Matrix Computations for Signal Processing

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

Linear Algebra: Characteristic Value Problem

Introduction to Numerical Linear Algebra II

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Stat 206: Linear algebra

Phys 201. Matrices and Determinants

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

1 Positive definiteness and semidefiniteness

Linear Algebra in Actuarial Science: Slides to the lecture

Appendix A: Matrices

Chapter 3 Transformations

Quiz ) Locate your 1 st order neighbors. 1) Simplify. Name Hometown. Name Hometown. Name Hometown.

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

1 Matrices and matrix algebra

Applied Linear Algebra in Geoscience Using MATLAB

Introduc)on to linear algebra

Part IA. Vectors and Matrices. Year

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Singular Value Decomposition and Principal Component Analysis (PCA) I

Properties of Matrices and Operations on Matrices

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Introduction to Linear Algebra, Second Edition, Serge Lange

A = 3 B = A 1 1 matrix is the same as a number or scalar, 3 = [3].

4 Linear Algebra Review

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

CS 143 Linear Algebra Review

Linear Algebra: Matrix Eigenvalue Problems

A VERY BRIEF LINEAR ALGEBRA REVIEW for MAP 5485 Introduction to Mathematical Biophysics Fall 2010

Image Registration Lecture 2: Vectors and Matrices

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

MAT 2037 LINEAR ALGEBRA I web:

Some notes on Linear Algebra. Mark Schmidt September 10, 2009

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1

Stat 206: Sampling theory, sample moments, mahalanobis

Foundations of Matrix Analysis

Math Matrix Algebra

Math 108b: Notes on the Spectral Theorem

3 (Maths) Linear Algebra

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

MATH 106 LINEAR ALGEBRA LECTURE NOTES

Conceptual Questions for Review

Linear Algebra Highlights

Transcription:

Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1 x 2... x p ] A vector can be represented in p-space as a directed line with components along the p axes. 38

Basic Matrix Concepts (cont d) Two vectors can be added if they have the same dimension. Addition is carried out elementwise. x + y = x 1 x 2. x p + y 1 y 2. y p = x 1 + y 1 x 2 + y 2. x p + y p A vector can be contracted or expanded if multiplied by a constant c. Multiplication is also elementwise. cx = c x 1 x 2. x p = cx 1 cx 2. cx p 39

Examples x = 2 1 4 and x = [ 2 1 4 ] 6 x = 6 2 1 4 = 6 2 6 1 6 ( 4) = 12 6 24 x + y = 2 1 4 + 5 2 0 = 2 + 5 1 2 4 + 0 = 7 1 4 40

Basic Matrix Concepts (cont d) Multiplication by c > 0 does not change the direction of x. Direction is reversed if c < 0. 41

Basic Matrix Concepts (cont d) The length of a vector x is the Euclidean distance from the origin L x = p x 2 j j=1 Multiplication of a vector x by a constant c changes the length: L cx = p j=1 c 2 x 2 j = c p j=1 x 2 j = c L x If c = L 1, then cx is a vector of unit length. x 42

The length of x = L x = 2 1 4 2 is Examples (2) 2 + (1) 2 + ( 4) 2 + ( 2) 2 = 25 = 5 Then z = 1 5 2 1 4 2 = 0.4 0.2 0.8 0.4 is a vector of unit length. 43

Angle Between Vectors Consider two vectors x and y in two dimensions. If θ 1 is the angle between x and the horizontal axis and θ 2 > θ 1 is the angle between y and the horizontal axis, then cos(θ 1 ) = x 1 cos(θ 2 ) = y 1 L x L y sin(θ 1 ) = x 2 sin(θ 2 ) = y 2, L x L y If θ is the angle between x and y, then cos(θ) = cos(θ 2 θ 1 ) = cos(θ 2 ) cos(θ 1 ) + sin(θ 2 ) sin(θ 1 ). Then cos(θ) = x 1y 1 + x 2 y 2 L x L y. 44

Angle Between Vectors (cont d) 45

Inner Product The inner product between two vectors x and y is x y = p j=1 x j y j. Then L x = x x, L y = y y and cos(θ) = x y (x x) (y y) Since cos(θ) = 0 when x y = 0 and cos(θ) = 0 for θ = 90 or θ = 270, then the vectors are perpendicular (orthogonal) when x y = 0. 46

Linear Dependence Two vectors, x and y, are linearly dependent if there exist two constants c 1 and c 2, not both zero, such that c 1 x + c 2 y = 0 If two vectors are linearly dependent, then one can be written as a linear combination of the other. From above: x = (c 2 /c 1 )y k vectors, x 1, x 2,..., x k, are linearly dependent if there exist constants (c 1, c 2,..., c k ) not all zero such that k j=1 c j x j = 0 47

Vectors of the same dimension that are not linearly dependent are said to be linearly independent

Linear Independence-example Let x 1 = 1 2 1, x 2 = 1 0 1, x 3 = 1 2 1 Then c 1 x 1 + c 2 x 2 + c 3 x 3 = 0 if c 1 + c 2 + c 3 = 0 2c 1 + 0 2c 3 = 0 c 1 c 2 + c 3 = 0 The unique solution is c 1 = c 2 = c 3 = 0, so the vectors are linearly independent. 48

Projections The projection of x on y is defined by Projection of x on y = x y y y y = x y L y 1 L y y. The length of the projection is Length of projection = x y L y = L x x y L x L y = L x cos(θ), where θ is the angle between x and y. 49

Matrices A matrix A is an array of elements a ij with n rows and p columns: A = a 11 a 12 a 1p a 21 a 22 a 2p...... a n1 a n2 a np The transpose A has p rows and n columns. The j-th row of A is the j-th column of A A = a 11 a 21 a n1 a 12 a 22 a n2...... a 1p a 2p a np 50

Matrix Algebra Multiplication of A by a constant c is carried out element by element. ca = ca 11 ca 12 ca 1p ca 21 ca 22 ca 2p...... ca n1 ca n2 ca np 51

Matrix Addition Two matrices A n p = {a ij } and B n p = {b ij } of the same dimensions can be added element by element. The resulting matrix is C n p = {c ij } = {a ij + b ij } C = A + B = = a 11 a 12 a 1p a 21 a 22 a 2p...... a n1 a n2 a np + b 11 b 12 b 1p b 21 b 22 b 2p...... b n1 b n2 b np a 11 + b 11 a 12 + b 12 a 1p + b 1p a 21 + b 21 a 22 + b 22 a 2p + b 2p...... a n1 + b n1 a n2 + b n2 a np + b np 52

Examples [ 2 1 4 5 7 0 ] = 2 5 1 7 4 0 6 [ 2 1 4 5 7 0 ] = [ 12 6 24 30 42 0 ] [ 2 1 0 3 ] + [ 2 1 5 7 ] = [ 4 0 5 10 ] 53

Matrix Multiplication Multiplication of two matrices A n p and B m q can be carried out only if the matrices are compatible for multiplication: A n p B m q : compatible if p = m. B m q A n p : compatible if q = n. The element in the i-th row and the j-th column of A B is the inner product of the i-th row of A with the j-th column of B. 54

Multiplication Examples [ 2 0 1 5 1 3 ] 1 4 1 3 0 2 = [ 2 10 4 29 ] [ 2 1 5 3 ] [ 1 4 1 3 ] = [ 1 11 2 29 ] [ 1 4 1 3 ] [ 2 1 5 3 ] = [ 22 13 13 8 ] 55

Identity Matrix An identity matrix, denoted by I, is a square matrix with 1 s along the main diagonal and 0 s everywhere else. For example, I 2 2 = [ 1 0 0 1 ] and I 3 3 = 1 0 0 0 1 0 0 0 1 If A is a square matrix, then AI = IA = A. I n n A n p = A n p but A n p I n n is not defined for p n. 56

Symmetric Matrices A square matrix is symmetric if A = A. If a square matrix A has elements {a ij }, then A is symmetric if a ij = a ji. Examples [ 4 2 2 4 ] 5 1 3 1 12 5 3 5 9 57

Inverse Matrix Consider two square matrices A k k and B k k. If AB = BA = I then B is the inverse of A, denoted A 1. The inverse of A exists only if the columns of A are linearly independent. If A = diag{a ij } then A 1 = diag{1/a ij }. 58

Inverse Matrix For a 2 2 matrix A, the inverse is A 1 = [ a11 a 12 a 21 a 22 ] 1 = 1 det(a) [ a 22 a 12 a 21 a 11 ], where det(a) = (a 11 a 22 ) (a 12 a 21 ) denotes the determinant of A. 59

Orthogonal Matrices A square matrix Q is orthogonal if or Q = Q 1. QQ = Q Q = I, If Q is orthogonal, its rows and columns have unit length (q j q j = 1) and are mutually perpendicular (q j q k = 0 for any j k). 60

Eigenvalues and Eigenvectors A square matrix A has an eigenvalue λ with corresponding eigenvector z 0 if Az = λz The eigenvalues of A are the solution to A λi = 0. A normalized eigenvector (of unit length) is denoted by e. A k k matrix A has k pairs of eigenvalues and eigenvectors λ 1, e 1 λ 2, e 2... λ k, e k where e i e i = 1, e i e j = 0 and the eigenvectors are unique up to a change in sign unless two or more eigenvalues are equal. 61

Spectral Decomposition Eigenvalues and eigenvectors will play an important role in this course. For example, principal components are based on the eigenvalues and eigenvectors of sample covariance matrices. The spectral decomposition of a k k symmetric matrix A is A = λ 1 e 1 e 1 + λ 2e 2 e 2 +... + λ ke k e k = [e 1 e 2 e k ] = P ΛP λ 1 0 0 0 λ 2 0...... 0 0 λ k [e 1 e 2 e k ] 62

Determinant and Trace The trace of a k k matrix A is the sum of the diagonal elements, i.e., trace(a) = k i=1 a ii The trace of a square, symmetric matrix A is the sum of the eigenvalues, i.e., trace(a) = k i=1 a ii = k i=1 λ i The determinant of a square, symmetric matrix A is the product of the eigenvalues, i.e., A = k i=1 λ i 63

Rank of a Matrix The rank of a square matrix A is The number of linearly independent rows The number of linearly independent columns The number of non-zero eigenvalues The inverse of a k k matrix A exists, if and only if i.e., there are no zero eigenvalues rank(a) = k 64

Positive Definite Matrix For a k k symmetric matrix A and a vector x = [x 1, x 2,..., x k ] the quantity x Ax is called a quadratic form Note that x Ax = k i=1 kj=1 a ij x i x j If x Ax 0 for any vector x, both A and the quadratic form are said to be non-negative definite. If x Ax > 0 for any vector x 0, both A and the quadratic form are said to be positive definite. 65

Example 2.11 Show that the matrix of the quadratic form 3x 2 1 + 2x2 2 2 2x 1 x 2 is positive definite. For A = [ 3 2 2 2 ], the eigenvalues are λ 1 = 4, λ 2 = 1. Then A = 4e 1 e 1 + e 2e 2. Write x Ax = 4x e 1 e 1 x + x e 2 e 2 x = 4y1 2 + y2 2 0, and is zero only for y 1 = y 2 = 0. 66

Example 2.11 (cont d) y 1, y 2 cannot be zero because [ ] [ y1 e = 1 y 2 e 2 ] [ x1 x 2 ] = P 2 2 x 2 1 with P orthonormal so that (P ) 1 = P. Then x = P y and since x 0 it follows that y 0. Using the spectral decomposition, we can show that: A is positive definite if all of its eigenvalues are positive. A is non-negative definite if all of its eigenvalues are 0. 67

Distance and Quadratic Forms For x = [x 1, x 2,..., x p ] and a p p positive definite matrix A, d 2 = x Ax > 0 when x 0. Thus, a positive definite quadratic form can be interpreted as a squared distance of x from the origin and vice versa. The squared distance from x to a fixed point µ is given by the quadratic form (x µ) A(x µ). 68

Distance and Quadratic Forms (cont d) We can interpret distance in terms of eigenvalues and eigenvectors of A as well. Any point x at constant distance c from the origin satisfies x Ax = x ( p j=1 λ j e j e j )x = p j=1 the expression for an ellipsoid in p dimensions. λ j (x e j ) 2 = c 2, Note that the point x = cλ 1/2 1 e 1 is at a distance c (in the direction of e 1 ) from the origin because it satisfies x Ax = c 2. The same is true for points x = cλ 1/2 j e j, j = 1,..., p. Thus, all points at distance c lie on an ellipsoid with axes in the directions of the eigenvectors and with lengths proportional to λ 1/2 j. 69

Distance and Quadratic Forms (cont d) 70

Square-Root Matrices Spectral decomposition of a positive definite matrix A yields A = p j=1 λ j e j e j = P ΛP, with Λ k k = diag{λ j }, all λ j > 0, and P k k = [e 1 e 2... e p ] an orthonormal matrix of eigenvectors. Then A 1 = P Λ 1 P = p j=1 1 λ j e j e j With Λ 1/2 = diag{λ 1/2 j }, a square-root matrix is A 1/2 = P Λ 1/2 P = p j=1 λj e j e j 71

Square-Root Matrices The square root of a positive definite matrix A has the following properties: 1. Symmetry: (A 1/2 ) = A 1/2 2. A 1/2 A 1/2 = A 3. A 1/2 = p j=1 λ 1/2 j e j e j = P Lambda 1/2 P 4. A 1/2 A 1/2 = A 1/2 A 1/2 = I 5. A 1/2 A 1/2 = A 1 Note that there are other ways of defining the square root of a positive definite matrix: in the Cholesky decomposition A = LL, with L a matrix of lower triangular form, L is also called a square root of A. 72

Random Vectors and Matrices A random matrix (vector) is a matrix (vector) whose elements are random variables. If X n p is a random matrix, the expected value of X is the n p matrix where E(X) = E(X 11 ) E(X 12 ) E(X 1p ) E(X 21 ) E(X 22 ) E(X 2p )... E(X n1 ) E(X n2 ) E(X np ) E(X ij ) = x ijf ij (x ij )dx ij with f ij (x ij ) the density function of the continuous random variable X ij. If X is a discrete random variable, we compute its expectation as a sum rather than an integral., 73

Linear Combinations The usual rules for expectations apply. If X and Y are two random matrices and A and B are two constant matrices of the appropriate dimensions, then E(X + Y ) = E(X) + E(Y ) E(AX) = AE(X) E(AXB) = AE(X)B E(AX + BY ) = AE(X) + BE(Y ) Further, if c is a scalar-valued constant then E(cX) = ce(x). 74

Mean Vectors and Covariance Matrices Suppose that X is p 1 (continuous) random vector drawn from some p dimensional distribution. Each element of X, say X j has its own marginal distribution with marginal mean µ j and variance σ jj defined in the usual way: µ j = σ jj = x jf j (x j )dx j (x j µ j ) 2 f j (x j )dx j 75

Mean Vectors and Covariance Matrices (cont d) To examine association between a pair of random variables we need to consider their joint distribution. A measure of the linear association between pairs of variables is given by the covariance σ jk = E [ (X j µ j )(X k µ k ) ] = (x j µ j )(x k µ k )f jk (x j, x k )dx j dx k. 76

Mean Vectors and Covariance Matrices (cont d) If the joint density function f jk (x j, x k ) can be written as the product of the two marginal densities, e.g., then X j and X k are independent. f jk (x j, x k ) = f j (x j )f k (x k ), More generally, the p dimensional random vector X has mutually independent elements if the p dimensional joint density function can be written as the product of the p univariate marginal densities. If two random variables X j and X k are independent, then their covariance is equal to 0. [Converse is not always true.] 77

Mean Vectors and Covariance Matrices (cont d) We use µ to denote the p 1 vector of marginal population means and use Σ to denote the p p population variance-covariance matrix: Σ = E [ (X µ)(x µ) ]. If we carry out the multiplication (outer product)then Σ is equal to: E (X 1 µ 1 ) 2 (X 1 µ 1 )(X 2 µ 2 ) (X 1 µ 1 )(X p µ p ) (X 2 µ 2 )(X 1 µ 1 ) (X 2 µ 2 ) 2 (X 2 µ 2 )(X p µ p )... (X p µ p )(X 1 µ 1 ) (X p µ p )(X 2 µ 2 ) (X p µ p ) 2. 78

Mean Vectors and Covariance Matrices (cont d) By taking expectations element-wise we find that Σ = σ 11 σ 12 σ 1p σ 21 σ 22 σ 2p... σ p1 σ p2 σ pp. Since σ jk = σ kj for all j k we note that Σ is symmetric. Σ is also non-negative definite 79

Correlation Matrix The population correlation matrix is the p p matrix with off-diagonal elements equal to ρ jk and diagonal elements equal to 1. 1 ρ 12 ρ 1p ρ 21 1 ρ 2p... ρ p1 ρ p2 1. Since ρ ij = ρ ji the correlation matrix is symmetric The correlation matrix is also non-negative definite 80

Correlation Matrix (cont d) The p p population standard deviation matrix V 1/2 is a diagonal matrix with σ jj along the diagonal and zeros in all off-diagonal positions. Then Σ = V 1/2 P V 1/2 and the population correlation matrix is (V 1/2 ) 1 Σ(V 1/2 ) 1 Given Σ, we can easily obtain the correlation matrix 81

Partitioning Random vectors If we partition the random p 1 vector X into two components X 1, X 2 of dimensions q 1 and (p q) 1 respectively, then the mean vector and the variance-covariance matrix need to be partitioned accordingly. Partitioned mean vector: [ X1 ] [ E(X1 ) ] [ µ1 ] E(X) = E X 2 = E(X 2 ) = µ 2 Partitioned variance-covariance matrix: Σ = [ V ar(x 1 ) Cov(X 1, X 2 ) Cov(X 2, X 1 ) V ar(x 2 ) ] [ ] Σ11 Σ 12 = Σ 12 Σ 22 where Σ 11 is q q, Σ 12 is q (p q) and Σ 22 is (p q) (p q)., 82

Partitioning Covariance Matrices (cont d) Σ 11, Σ 22 are the variance-covariance matrices of the sub-vectors X 1, X 2, respectively. The off-diagonal elements in those two matrices reflect linear associations among elements within each sub-vector. There are no variances in Σ 12, only covariances. These covariancs reflect linear associations between elements in the two different sub-vectors. 83

Linear Combinations of Random variables Let X be a p 1 vector with mean µ and variance covariance matrix Σ, and let c be a p 1 vector of constants. Then the linear combination c X has mean and variance: E(c X) = c µ, and V ar(c X) = c Σc In general, the mean and variance of a q 1 vector of linear combinations Z = C q p X p 1 are µ Z = Cµ X and Σ Z = CΣ X C. 84

Cauchy-Schwarz Inequality We will need some of the results below to derive some maximization results later in the course. Cauchy-Schwarz inequality Let b and d be any two p 1 vectors. Then, (b d) 2 (b b)(d d) with equality only if b = cd for some scalar constant c. Proof: The equality is obvious for b = 0 or d = 0. For other cases, consider b cd for any constant c 0. Then if b cd 0, we have 0 < (b cd) (b cd) = b b 2c(b d) + c 2 d d, since b cd must have positive length. 85

Cauchy-Schwarz Inequality We can add and subtract (b d) 2 /(d d) to obtain 0 < b b 2c(b d)+c 2 d d (b d) 2 d d +(b d) 2 d d = b b (b d) 2 Since c can be anything, we can choose c = b d/d d. Then, 0 < b b (b d) 2 d (b d) 2 < (b b)(d d) d for b cd (otherwise, we have equality). d d +(d d) ( c b d d d ) 2 86

Extended Cauchy-Schwarz Inequality If b and d are any two p 1 vectors and B is a p p positive definite matrix, then (b d) 2 (b Bb)(d B 1 d) with equality if and only if b = cb 1 d or d = cbb for some constant c. Proof: Consider B 1/2 = p Then we can write i=1 λi e i e i, and B 1/2 = p i=1 1 ( λ i ) e ie i. b d = b Id = b B 1/2 B 1/2 d = (B 1/2 b) (B 1/2 d) = b d. To complete the proof, simply apply the Cauchy-Schwarz inequality to the vectors b and d. 87

Optimization Let B be positive definite and let d be any p 1 vector. Then max x 0 (x d) 2 x Bx = d B 1 d is attained when x = cb 1 d for any constant c 0. Proof: By the extended Cauchy-Schwartz inequality: (x d) 2 (x Bx)(d B 1 d). Since x 0 and B is positive definite, x Bx > 0 and we can divide both sides by x Bx to get an upper bound (x d) 2 x Bx d B 1 d. Differentiating the left side with respect to x shows that maximum is attained at x = cb 1 d. 88

Maximization of a Quadratic Form on a Unit Sphere B is positive definite with eigenvalues λ 1 λ 2 λ p 0 and associated eigenvectors (normalized) e 1, e 2,, e p. Then max x 0 x Bx x x = λ 1, attained when x = e 1 min x 0 x Bx x x = λ p, attained when x = e p. Furthermore, for k = 1, 2,, p 1, x Bx max x e 1,e 2,,e k x x = λ k+1 is attained when x = e k+1. See proof at end of chapter 2 in the textbook (pages 80-81). 89