Tutorials in Optimization. Richard Socher

Similar documents
DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Chapter 3 Transformations

Review of Linear Algebra

Functional Analysis Review

Chap 3. Linear Algebra

Mathematical foundations - linear algebra

Linear Algebra, part 2 Eigenvalues, eigenvectors and least squares solutions

NORMS ON SPACE OF MATRICES

CS 143 Linear Algebra Review

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Linear Algebra Review. Vectors

Mathematical foundations - linear algebra

Linear Algebra Lecture Notes-II

Review of some mathematical tools

LINEAR ALGEBRA REVIEW

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

The Singular Value Decomposition

Symmetric matrices and dot products

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Extreme Values and Positive/ Negative Definite Matrix Conditions

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

Linear Algebra: Matrix Eigenvalue Problems

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Linear algebra and applications to graphs Part 1

Linear Algebra Massoud Malek

18.06SC Final Exam Solutions

Lecture 7: Positive Semidefinite Matrices

Example Linear Algebra Competency Test

CS 246 Review of Linear Algebra 01/17/19

(x, y) = d(x, y) = x y.

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

Lecture 8: Linear Algebra Background

Math for ML: review. ML and knowledge of other fields

Review of linear algebra

Knowledge Discovery and Data Mining 1 (VO) ( )

Linear Algebra: Characteristic Value Problem

Matrices A brief introduction

Exercises * on Linear Algebra

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

GQE ALGEBRA PROBLEMS

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Lecture 7. Econ August 18

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

1 Singular Value Decomposition and Principal Component

Properties of Matrices and Operations on Matrices

SYLLABUS. 1 Linear maps and matrices

Basic Calculus Review

The following definition is fundamental.

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Linear Algebra. Min Yan

Real Symmetric Matrices and Semidefinite Programming

Introduction to Matrix Algebra

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

1. General Vector Spaces

6 Inner Product Spaces

Quantum Computing Lecture 2. Review of Linear Algebra

Review problems for MA 54, Fall 2004.

Linear Algebra (Review) Volker Tresp 2017

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Lecture 8 : Eigenvalues and Eigenvectors

4 Linear Algebra Review

Transpose & Dot Product

5 Linear Algebra and Inverse Problem

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

October 25, 2013 INNER PRODUCT SPACES

Lecture notes: Applied linear algebra Part 1. Version 2

Large Scale Data Analysis Using Deep Learning

Matrix Algebra: Summary

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003

There are six more problems on the next two pages

Linear Algebra (Review) Volker Tresp 2018

Transpose & Dot Product

Chapter 6: Orthogonality

Lecture 2: Linear Algebra

Linear Algebra II. 7 Inner product spaces. Notes 7 16th December Inner products and orthonormal bases

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

Schur s Triangularization Theorem. Math 422

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

Designing Information Devices and Systems II

Section 3.9. Matrix Norm

Seminar on Linear Algebra

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

1 Inner Product and Orthogonality

Ordinary Differential Equations II

Basic Concepts in Matrix Algebra

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Second Order Elliptic PDE

Lecture 2: Linear Algebra Review

Linear Algebra using Dirac Notation: Pt. 2

Numerical Methods for Solving Large Scale Eigenvalue Problems

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Diego Tipaldi, Luciano Spinello

Characterization of half-radial matrices

Transcription:

Tutorials in Optimization Richard Socher July 20, 2008

CONTENTS 1 Contents 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1.1 Definitions........................................ 2 1.2 Inner Product...................................... 2 1.3 Quadratic Form and the Rayleigh-Ritz Quotient.................. 2 1.4 A real symmetric matrix has only real eigenvalues. (i)............... 3 1.5 The eigenvectors of a real symmetric matrix form an orthonormal basis. (ii)... 4 1.6 Spectral decomposition of a real symmetric matrix (3)............... 5 1.7 Equality of optimization of Rayleigh-Ritz quotient. (4)............... 5 n 1.8 λ i < u i, x > 2 = λ max....................... 6 2 Analysis 6 2.1 Dual Space....................................... 6 2.2 Operator Norm..................................... 7 2.3 Sets........................................... 7

Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 1.1 Definitions Definition 1.1. A symmetric matrix A such that for any (conformable) vector x 0 the quadratic form 0 is called a positive semidefinite matrix. 1.2 Inner Product Axioms: x, y, z V, a, b F e.g. R or C, : V V F Conjugate symmetry: If F = R, then x, y = y, x. Linearity in first variable: Nonnegativity: Nondegeneracy: x, y = y, x. ax, y = a x, y. x + y, z = x, z + y, z. x, x 0. x, x = 0 x = 0 and Due to linearity and conjugate symmetry: x, by = b x, y. x, y + z = x, y + x, z. For real vector spaces, the inner product is a positive-definite nondegenerate symmetric bilinear form. 1.3 Quadratic Form and the Rayleigh-Ritz Quotient Proposition 1.2. If A is a symmetric matrix, the optimization problem x x T x = u max (1) is solved by finding the eigenvector u max corresponding to the largest eigenvalue of A: Au i = λ i u i (2) where ut max Aumax u T max umax = λ max is the eigenvalue corresponding to the eigenvector u max.

1.4 A real symmetric matrix has only real eigenvalues. (i) 3 Proof. Because A is symmetric and positive semidefinite, its eigenvalues are real (i) and its eigenvectors u i form an orthonormal basis (ii) and A has the eigenvalue decomposition: Hence, A = We can also see that the following two formulations are equal: x λ i u i u T i (3) x T x = (4) (5) = = = = = < x, Ax > (6) ( n ) < x, λ i u i u T i x > (7) ( n ) x T λ i u i u T i x (8) λ i x T u i u T i x (9) λ i < u i, x > 2 = λ max (10) Side notes: If a vector space V over the real numbers R carries an inner product, then the inner product is a bilinear map V V R. Hence, < x, y >=< y, x > < u i, x > 2 = (u T i x)(ut i x) If A is symmetric: x T Ay = y T Ax Now, let s prove (i), (ii), (3), (4) and (10) 1.4 A real symmetric matrix has only real eigenvalues. (i) Proposition 1.3. A real symmetric matrix has only real eigenvalues. Proof. Normally, matrices might have complex eigenvalues. However, symmetric matrices only have real eigenvalues. Let us start the proof by repeating equation 2: Au i = λ i u i (11) Left multiply this by u i T, the transpose of the complex conjugate of this eigenvector u i T Au i = λ i u i T u i (12)

1.5 The eigenvectors of a real symmetric matrix form an orthonormal basis. (ii) 4 Let s now take the complex conjugate of both sides of 11 Au i = λ i u i (13) Because A consists of only reals: Au i = λ i u i (14) Now we left multiply u T i Let s now take the transpose of both sides of the last equation: Because A is symmetric we get: So we combining (11) and (16) Hence all λ i s are real. u T i Au i = λ i u T i u i (15) u i T A T u i = λ i u i T u i (16) u i T Au i = λ i u i T u i (17) λ i u i T u i = λ i u i T u i (18) 1.5 The eigenvectors of a real symmetric matrix form an orthonormal basis. (ii) Proposition 1.4. The eigenvectors of a real symmetric matrix can be chosen to be orthonormal and form an orthonormal basis. Proof. We start again with Au i = λ i u i (19) Multiply each side by u T j From another eigenvector we get: Transposing this equation and using that A = A T : Subtracting (19) and (21): u T j Au i = λ i u T j u i (20) u T i Au j = λ j u T i u j (21) u T j Au i = λ j u T j u i (22) 0 = (λ i λ j )u T j u i (23) If λ j λ i, we see that both eigenvectors have to be orthogonal. If λ j = λ i, then it can be easily seen that any linear combination αu i + βu j is also an eigenvector with λ i as an eigenvalue. Because eigenvectors are not linearly dependent, we can choose the second one to be orthogonal to the first. By normalizing each vector to unit length, we get an orthonormal basis. Now let s analyze equation (3):

1.6 Spectral decomposition of a real symmetric matrix (3) 5 1.6 Spectral decomposition of a real symmetric matrix (3) Proposition 1.5. A real symmetric matrix A has the following spectral decomposition: A = λ i u i u T i (24) Proof. We start with the observation that equations for all eigenvectors u i with i = 1,..., M where A R M M. can be reformulated into: Au i = λ i u i (25) AU = UΛ (26) where U has the normalized eigenvectors as columns u i. Note that U is an orthogonal matrix! 1 UU 1 = UU T = U T U = I (27) Angles and Lengths are preserved by U. Λ is a diagonal matrix with all λ i s on its diagonal. Now, we can diagonalize A: U T AU = Λ (28) or rewrite it as: M A = UΛU T = λ i u i u T i (29) 1.7 Equality of optimization of Rayleigh-Ritz quotient. (4) Proposition 1.6. x x T x = (30) Intuitively one may see this because we are maximizing for the direction of x not its length. Formally we note: Proof. x x T x Let s rewrite the vector x as a linear combination of the eigenvectors u i : (31) x = α i u i (32) 1 A real square matrix is orthogonal if and only if its columns form an orthonormal basis of the Euclidean space R n with the ordinary Euclidean dot product, which is the case if and only if its rows form an orthonormal basis of R n.

1.8 n λ i < u i, x > 2 = λ max 6 We can express the original equation as: Using Au i = λ i u i x T x = ( n j=1 α ju j ) T A( n α iu i ) ( n j=1 α ju j ) T ( n α iu i ) = ( n j=1 α ju j ) T ( n λ iα i u i ) ( n j=1 α ju j ) T ( n α iu i ) By orthogonality and unit length of the eigenvectors, we get: n = α2 i λ i n α2 i (33) (34) (35) (36) If a vector x maximizes this equation, then any vector k x (for k 0) also maximizes it. We reduced the problem to maximizing n α2 i λ i under the constraint that n α2 i = 1. In the next proof, we find out how exactly this relates to our initial formulation. 1.8 n λ i < u i, x > 2 = λ max We are now ready to prove equation (10): Proposition 1.7. Proof. However, we know that: n < u i, x > 2 = n (ut i x)ut i x λ i < u i, x > 2 = λ max (37) λ i < u i, x > 2 = λ max (38) = n (ut i x)t u T i x = n xt u i u T i x = xt the largest λ i. (u i u T i ) x = 1 Therefore the sum is maximized by } {{ } I Usable for generalized Rayleigh quotient and LDA/Fisher Criterion: 2 Analysis 2.1 Dual Space max J(w) = max wt Σ b w w T Σ w w A vector space V has a corresponding dual space consisting of all linear functionals on V. For the vector space on R n (i.e. the space of columns of n real numbers), the dual space is written as the space of rows of n real numbers.

2.2 Operator Norm 7 2.2 Operator Norm Is a norm defined on the space of bounded (or continuous) linear operators between two given normed vector spaces V and W (over base field R or C). A linear map A : V W is bounded (or continuous) iff there exists a real number c such that Av W c v V for all v V The continuous operator A never lengthens any vector more than by a factor of c. Hence, image of a bounded set of A is also bounded. Operator norm of A defines how much it lengthens a vectors in the worst case: A op = min{c : Av c v for all v V } It exists because the set of all such c is closed, nonempty, and bounded from below. 2.3 Sets A set C is open if every point x in C is an interior point or equivalently, if the distance between any point x in U and the edge of U is always greater than zero. Definition 2.1. A closed set is a set whose complement is open. The closed interval [a, b] of real numbers is closed. Definition 2.2. A subset S of a metric space (M, d) is bounded if it is contained in a ball of finite radius, i.e. x M r > 0 : s S : d(x, s) < r. Definition 2.3. A set C R n is compact if it is closed and bounded On a compact set every continuous functions attains its global maximum/minimum.