Basic concepts in Linear Algebra and Optimization

Basic concepts in Linear Algebra and Optimization Yinbin Ma GEOPHYS 211

Outline Basic Concepts on Linear Algbra vector space norm linear mapping, range, null space matrix multiplication terative Methods for Linear Optimization normal equation steepest descent conjugate gradient Unconstrainted Nonlinear Optimization Optimality condition Methods based on a local quadratic model Line search methods

Basic concepts - vector space A vector space is any set V for which two operations are defined: 1) Vector addition: any vector x 1 and x 2 in set V can be added to another vector x = x 1 + x 2 and x is also in set V. 2) Scalar Multiplication: Any vector x in V can be multiplied ("scaled") by a real number c 2 R to produce a second vector cx which is also in V. n this class, we only discuss the case where V R n,meaningeachvector x is the space is a n-dimensional column vector.

Basic concepts - norm The model space and data space we mentioned in class are normed vector spaces. A norm is a function k k : R n! R that map a vector to a real number. A norm must satisfy the following: 1) kxk 0andkxk = 0i x = 0 2) kx + yk apple kxk + kyk 3) kaxk = a kxk where x and y are vectors in vector space V and a 2 R.

Basic concepts - norm We will see the following norm in this course: 1) L 2 norm: for a vector x, thel 2 norm is defined as: kxk 2 s nâ i=1 2) L 1 norm: for a vector x,the L 2 norm is defined as: kxk 1 n Â i=1 x 2 i x i 3) L norm: for a vector x,the L norm is defined as: The norm for a matrix is induced as: kxk max i=1,,n x i A a = sup x6=0 Ax a x a

Basic concepts - linear mapping, range and null space We say a a map x! Ax is linear if for any x,y 2 R n,andanya 2 R, A(x + y)=ax + Ay A(ax)=aAx t can be proved that each linear mapping from R n to R m can be expressed by the multiplication of a m n matrix. The range of linear operator A 2 R m n,isthespacespannedbythe columns of A, range(a)={y such that y = Ax,x 2 R n } The null space of linear operator A 2 R m n is the space, null(a)={x such that Ax = 0} t is obvious that range(a) is perpendicular to null(a T ).(exercise)

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 1: b ij = m Â k=1 a ik c kj Here b ij, a ik,andc kj are entries of B, A, C.

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 2: B =[b 1 b 2 b n ] Here b i is the i th column of matrix B. Then, B =[Ac 1 Ac 2 Ac n ] b i = Ac i Each column of B is in the range (we will talk about it later) of A. Thus, the range of B is the subset of the range of A.

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 3: 2 B = 6 4 Here b i is the i th row of matrix B. Then, This form is not commenly used. 2 B = 6 4 b T i b T 1 b T 2 b T l =ã T i C ã T 1 C ã T 2 C ã T l C 3 7 5 3 7 5

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 4: B = Â i,j=1,,m Where, a i is the i th column of matrix A, and c j T is the j th row of matrix C. Each term a i c j T is a rank-one matrix. a i c T j

Linear Optimization- normal equation We solve a linear system having n unknowns and with m > n equations. We want to find a vector m 2 R n that satisfies, where d 2 R m and F 2 R m n. Reformulate the problem: Fm = d define residual r = d Fm find m that minimizekrk 2 = kfm dk 2 t can be proved that, we can minimize the residual norm when F r = 0. This is equivalent to a n n system, F Fm = F d which is the normal equation. We can solve norm equation using direction methods such at LU, QR, SVD, Cholesky decomposition, etc.

Linear Optimization-steepest descent method For the unconstraint linear optimization problem: min J(m)=kFm dk 2 2 To find the minimum of objective function J(m) iteratively using steepest descent method, at the current point m k, we update the model by moving along the nagative direction of gradient, m k+1 = m k a J(m k ) J(m k )=F (Fm k d) The gradient can be evaluated exactly, and we have analytical formula for the optimal a.

Linear Optimization-conjugate gradient method For the unconstraint linear optimization problem: min J(m)=kFm dk 2 2 Starting from m 0, we have a series of search direction m i,i = 1,2,,k, and updated model iteratively,m i = m i 1 a i 1 m i 1, i = 1,,k. For the next search direction m k in the space span{ m 0,, m k 1, J(m k )}, k 1 m k = Â i=0 c i m i + c k J(m k ) The magic is that for linear problem c 0 = c 1 = = c k up with Conjugate gradient method, m k = c k 1 m k 1 + c k J(m k ) a k = min J(m k + a k m k ) m k+1 = m k + a k m k 2 = 0. We ended We are searching within the space span{ m 0,, m k 1, J(m k )} in CG method, though looks like we are doing a plane search.

Unconstrainted Nonlinear Optimization-Optimality condition For the unconstraint nonlinear optimization problem: minimize m J(m) where J(m) is a real-valued function. How should we determine if m is a local minimizer? Theorem (First order necessary conditions for a local minimum) J(m )=0 Theorem (Second order necessary conditions for a local minimum) s 2 J(m )s 0, 8s 2 R n

Unconstrainted Nonlinear Optimization-Search direction For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction real number, such that J(m k + a k m k ) < J(m k ). How do we choose the search direction m k? 1) Gradient based method, m k,anda J(m k + a k m k ) J(m k ) a k J(m k ) T m k + O(k m k k 2 2 ) Thus, m k = J(m k ) is a search direction. We can also use similar technique in CG method, m k = c 1 J(m k )+c 2 m k 1 where c 1,c 2 2 R.

Unconstrainted Nonlinear Optimization-Search direction For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction real number, such that J(m k + a k m k ) < J(m k ). How do we choose the search direction m k? 1) Methods based on a local quadratic model, J(m k + a k m k ) J(m k ) a k J(m k ) T m k + a 2 k We solve the approximated problem, minimize y(p k ) J(m k ) T p k + 1 2 p k 2 J(m k )p k p k = a k m k m k,anda 1 2 mt k 2 J(m k ) m k The approximated problem is a linear system and can be solved exactly. Then, update the model, m k+1 = m k + p k

Unconstrainted Nonlinear Optimization-Line search For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction m k,anda real number, such that J(m k + a k m k ) < J(m k ). How do we choose a k for a given search direction m k? Can we choose arbitrary a k such that J(m k + a k m k ) < J(m k )? The answer is no. For example, J(m)=m 2, m 2 R 1.Wecanfinda sequence, such that Then, m 0 = 2, m k = m k a k = 2 + 3 2 (k+1) 1 + 2 k m k =( 1) k (1 + 2 k ) 1 J(m k )= (1 + 2 k ) 2! 1

Unconstrainted Nonlinear Optimization-Line search For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction m k,anda real number, such that J(m k + a k m k ) < J(m k ). How do we choose a k for a given search direction m k? A popular set of conditions that guarentee convergence named Wolfe condition: J(m k + a k m k ) apple J(m k )+c 1 a k J(m k ) T m k J(m k + a k m k ) T m k c 2 a k J(m k ) T m k where 0 < c 1 < c 2 < 1.

Reference Numerical Linear Algebra, by Lloyd N. Trefethen, David Bau. Numerical Optimization, by Jorge Nocedal, Stephen Wright. Lecture notes from Prof. Walter Murray, http://web.stanford.edu/class/cme304/