Tutorials in Optimization Richard Socher July 20, 2008
CONTENTS 1 Contents 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1.1 Definitions........................................ 2 1.2 Inner Product...................................... 2 1.3 Quadratic Form and the Rayleigh-Ritz Quotient.................. 2 1.4 A real symmetric matrix has only real eigenvalues. (i)............... 3 1.5 The eigenvectors of a real symmetric matrix form an orthonormal basis. (ii)... 4 1.6 Spectral decomposition of a real symmetric matrix (3)............... 5 1.7 Equality of optimization of Rayleigh-Ritz quotient. (4)............... 5 n 1.8 λ i < u i, x > 2 = λ max....................... 6 2 Analysis 6 2.1 Dual Space....................................... 6 2.2 Operator Norm..................................... 7 2.3 Sets........................................... 7
Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 1.1 Definitions Definition 1.1. A symmetric matrix A such that for any (conformable) vector x 0 the quadratic form 0 is called a positive semidefinite matrix. 1.2 Inner Product Axioms: x, y, z V, a, b F e.g. R or C, : V V F Conjugate symmetry: If F = R, then x, y = y, x. Linearity in first variable: Nonnegativity: Nondegeneracy: x, y = y, x. ax, y = a x, y. x + y, z = x, z + y, z. x, x 0. x, x = 0 x = 0 and Due to linearity and conjugate symmetry: x, by = b x, y. x, y + z = x, y + x, z. For real vector spaces, the inner product is a positive-definite nondegenerate symmetric bilinear form. 1.3 Quadratic Form and the Rayleigh-Ritz Quotient Proposition 1.2. If A is a symmetric matrix, the optimization problem x x T x = u max (1) is solved by finding the eigenvector u max corresponding to the largest eigenvalue of A: Au i = λ i u i (2) where ut max Aumax u T max umax = λ max is the eigenvalue corresponding to the eigenvector u max.
1.4 A real symmetric matrix has only real eigenvalues. (i) 3 Proof. Because A is symmetric and positive semidefinite, its eigenvalues are real (i) and its eigenvectors u i form an orthonormal basis (ii) and A has the eigenvalue decomposition: Hence, A = We can also see that the following two formulations are equal: x λ i u i u T i (3) x T x = (4) (5) = = = = = < x, Ax > (6) ( n ) < x, λ i u i u T i x > (7) ( n ) x T λ i u i u T i x (8) λ i x T u i u T i x (9) λ i < u i, x > 2 = λ max (10) Side notes: If a vector space V over the real numbers R carries an inner product, then the inner product is a bilinear map V V R. Hence, < x, y >=< y, x > < u i, x > 2 = (u T i x)(ut i x) If A is symmetric: x T Ay = y T Ax Now, let s prove (i), (ii), (3), (4) and (10) 1.4 A real symmetric matrix has only real eigenvalues. (i) Proposition 1.3. A real symmetric matrix has only real eigenvalues. Proof. Normally, matrices might have complex eigenvalues. However, symmetric matrices only have real eigenvalues. Let us start the proof by repeating equation 2: Au i = λ i u i (11) Left multiply this by u i T, the transpose of the complex conjugate of this eigenvector u i T Au i = λ i u i T u i (12)
1.5 The eigenvectors of a real symmetric matrix form an orthonormal basis. (ii) 4 Let s now take the complex conjugate of both sides of 11 Au i = λ i u i (13) Because A consists of only reals: Au i = λ i u i (14) Now we left multiply u T i Let s now take the transpose of both sides of the last equation: Because A is symmetric we get: So we combining (11) and (16) Hence all λ i s are real. u T i Au i = λ i u T i u i (15) u i T A T u i = λ i u i T u i (16) u i T Au i = λ i u i T u i (17) λ i u i T u i = λ i u i T u i (18) 1.5 The eigenvectors of a real symmetric matrix form an orthonormal basis. (ii) Proposition 1.4. The eigenvectors of a real symmetric matrix can be chosen to be orthonormal and form an orthonormal basis. Proof. We start again with Au i = λ i u i (19) Multiply each side by u T j From another eigenvector we get: Transposing this equation and using that A = A T : Subtracting (19) and (21): u T j Au i = λ i u T j u i (20) u T i Au j = λ j u T i u j (21) u T j Au i = λ j u T j u i (22) 0 = (λ i λ j )u T j u i (23) If λ j λ i, we see that both eigenvectors have to be orthogonal. If λ j = λ i, then it can be easily seen that any linear combination αu i + βu j is also an eigenvector with λ i as an eigenvalue. Because eigenvectors are not linearly dependent, we can choose the second one to be orthogonal to the first. By normalizing each vector to unit length, we get an orthonormal basis. Now let s analyze equation (3):
1.6 Spectral decomposition of a real symmetric matrix (3) 5 1.6 Spectral decomposition of a real symmetric matrix (3) Proposition 1.5. A real symmetric matrix A has the following spectral decomposition: A = λ i u i u T i (24) Proof. We start with the observation that equations for all eigenvectors u i with i = 1,..., M where A R M M. can be reformulated into: Au i = λ i u i (25) AU = UΛ (26) where U has the normalized eigenvectors as columns u i. Note that U is an orthogonal matrix! 1 UU 1 = UU T = U T U = I (27) Angles and Lengths are preserved by U. Λ is a diagonal matrix with all λ i s on its diagonal. Now, we can diagonalize A: U T AU = Λ (28) or rewrite it as: M A = UΛU T = λ i u i u T i (29) 1.7 Equality of optimization of Rayleigh-Ritz quotient. (4) Proposition 1.6. x x T x = (30) Intuitively one may see this because we are maximizing for the direction of x not its length. Formally we note: Proof. x x T x Let s rewrite the vector x as a linear combination of the eigenvectors u i : (31) x = α i u i (32) 1 A real square matrix is orthogonal if and only if its columns form an orthonormal basis of the Euclidean space R n with the ordinary Euclidean dot product, which is the case if and only if its rows form an orthonormal basis of R n.
1.8 n λ i < u i, x > 2 = λ max 6 We can express the original equation as: Using Au i = λ i u i x T x = ( n j=1 α ju j ) T A( n α iu i ) ( n j=1 α ju j ) T ( n α iu i ) = ( n j=1 α ju j ) T ( n λ iα i u i ) ( n j=1 α ju j ) T ( n α iu i ) By orthogonality and unit length of the eigenvectors, we get: n = α2 i λ i n α2 i (33) (34) (35) (36) If a vector x maximizes this equation, then any vector k x (for k 0) also maximizes it. We reduced the problem to maximizing n α2 i λ i under the constraint that n α2 i = 1. In the next proof, we find out how exactly this relates to our initial formulation. 1.8 n λ i < u i, x > 2 = λ max We are now ready to prove equation (10): Proposition 1.7. Proof. However, we know that: n < u i, x > 2 = n (ut i x)ut i x λ i < u i, x > 2 = λ max (37) λ i < u i, x > 2 = λ max (38) = n (ut i x)t u T i x = n xt u i u T i x = xt the largest λ i. (u i u T i ) x = 1 Therefore the sum is maximized by } {{ } I Usable for generalized Rayleigh quotient and LDA/Fisher Criterion: 2 Analysis 2.1 Dual Space max J(w) = max wt Σ b w w T Σ w w A vector space V has a corresponding dual space consisting of all linear functionals on V. For the vector space on R n (i.e. the space of columns of n real numbers), the dual space is written as the space of rows of n real numbers.
2.2 Operator Norm 7 2.2 Operator Norm Is a norm defined on the space of bounded (or continuous) linear operators between two given normed vector spaces V and W (over base field R or C). A linear map A : V W is bounded (or continuous) iff there exists a real number c such that Av W c v V for all v V The continuous operator A never lengthens any vector more than by a factor of c. Hence, image of a bounded set of A is also bounded. Operator norm of A defines how much it lengthens a vectors in the worst case: A op = min{c : Av c v for all v V } It exists because the set of all such c is closed, nonempty, and bounded from below. 2.3 Sets A set C is open if every point x in C is an interior point or equivalently, if the distance between any point x in U and the edge of U is always greater than zero. Definition 2.1. A closed set is a set whose complement is open. The closed interval [a, b] of real numbers is closed. Definition 2.2. A subset S of a metric space (M, d) is bounded if it is contained in a ball of finite radius, i.e. x M r > 0 : s S : d(x, s) < r. Definition 2.3. A set C R n is compact if it is closed and bounded On a compact set every continuous functions attains its global maximum/minimum.