Basic concepts in Linear Algebra and Optimization

Similar documents
AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

b 1 b 2.. b = b m A = [a 1,a 2,...,a n ] where a 1,j a 2,j a j = a m,j Let A R m n and x 1 x 2 x = x n

Draft. Lecture 01 Introduction & Matrix-Vector Multiplication. MATH 562 Numerical Analysis II. Songting Luo

B553 Lecture 5: Matrix Algebra Review

The Conjugate Gradient Method

Numerical Methods I: Eigenvalues and eigenvectors

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Nonlinear Optimization for Optimal Control

AMS526: Numerical Analysis I (Numerical Linear Algebra)

MATH 350: Introduction to Computational Mathematics

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

EECS 275 Matrix Computation

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Course Notes: Week 1

Lecture 3: Linear Algebra Review, Part II

EECS 275 Matrix Computation

Review of Matrices and Block Structures

Introduction to Numerical Linear Algebra II

Review of Some Concepts from Linear Algebra: Part 2

MAT 610: Numerical Linear Algebra. James V. Lambers

Numerical Optimization of Partial Differential Equations

Conjugate Gradient (CG) Method

5. Orthogonal matrices

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

Scientific Computing: Optimization

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Solving linear equations with Gaussian Elimination (I)

Matrices: 2.1 Operations with Matrices

1 Conjugate gradients

Maths for Signals and Systems Linear Algebra in Engineering

EE731 Lecture Notes: Matrix Computations for Signal Processing

Linear Algebra: Matrix Eigenvalue Problems

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

CAAM 335: Matrix Analysis

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Linear Algebra Formulas. Ben Lee

1 Non-negative Matrix Factorization (NMF)

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

The Newton-Raphson Algorithm

5.5 Quadratic programming

Lecture 6, Sci. Comp. for DPhil Students

Vector Spaces, Orthogonality, and Linear Least Squares

4.6 Iterative Solvers for Linear Systems

Scientific Computing: Dense Linear Systems

Numerical Methods - Numerical Linear Algebra

Computational Linear Algebra

Programming, numerics and optimization

10.34: Numerical Methods Applied to Chemical Engineering. Lecture 2: More basics of linear algebra Matrix norms, Condition number

Some minimization problems

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

There are six more problems on the next two pages

Linear Algebra Review. Vectors

Numerical linear algebra

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

Computational math: Assignment 1

6.4 Krylov Subspaces and Conjugate Gradients

IE 5531: Engineering Optimization I

MTH 464: Computational Linear Algebra

Lecture 6. Numerical methods. Approximation of functions

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

BASIC NOTIONS. x + y = 1 3, 3x 5y + z = A + 3B,C + 2D, DC are not defined. A + C =

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

G1110 & 852G1 Numerical Linear Algebra

Inner Product and Orthogonality

Perspective Projection of an Ellipse

AMS526: Numerical Analysis I (Numerical Linear Algebra)

The set of all solutions to the homogeneous equation Ax = 0 is a subspace of R n if A is m n.

Introduction to Scientific Computing

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Chapter 3. Vector spaces

Computational Linear Algebra

MATH 22A: LINEAR ALGEBRA Chapter 4

Dot Products, Transposes, and Orthogonal Projections

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Solution of Linear Equations

Matrix-Product-States/ Tensor-Trains

Worksheet for Lecture 23 (due December 4) Section 6.1 Inner product, length, and orthogonality

Lecture 2: Linear Algebra Review

Linear Algebra- Final Exam Review

Lecture 02 Linear Algebra Basics

SECTION 3.3. PROBLEM 22. The null space of a matrix A is: N(A) = {X : AX = 0}. Here are the calculations of AX for X = a,b,c,d, and e. =

Pseudoinverse & Moore-Penrose Conditions

5.6. PSEUDOINVERSES 101. A H w.

Matrix Factorization and Analysis

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Linear Systems. Carlo Tomasi

UCSD ECE269 Handout #8 Prof. Young-Han Kim Wednesday, February 7, Homework Set #4 (Due: Wednesday, February 21, 2018)

17 Solution of Nonlinear Systems

Lecture notes: Applied linear algebra Part 1. Version 2

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Numerical Methods I Eigenvalue Problems

Matrix decompositions

MTH 464: Computational Linear Algebra

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Matrix invertibility. Rank-Nullity Theorem: For any n-column matrix A, nullity A +ranka = n

Algebra C Numerical Linear Algebra Sample Exam Problems

Transcription:

Basic concepts in Linear Algebra and Optimization Yinbin Ma GEOPHYS 211

Outline Basic Concepts on Linear Algbra vector space norm linear mapping, range, null space matrix multiplication terative Methods for Linear Optimization normal equation steepest descent conjugate gradient Unconstrainted Nonlinear Optimization Optimality condition Methods based on a local quadratic model Line search methods

Outline Basic Concepts on Linear Algbra vector space norm linear mapping, range, null space matrix multiplication terative Methods for Linear Optimization normal equation steepest descent conjugate gradient Unconstrainted Nonlinear Optimization Optimality condition Methods based on a local quadratic model Line search methods

Outline Basic Concepts on Linear Algbra vector space norm linear mapping, range, null space matrix multiplication terative Methods for Linear Optimization normal equation steepest descent conjugate gradient Unconstrainted Nonlinear Optimization Optimality condition Methods based on a local quadratic model Line search methods

Basic concepts - vector space A vector space is any set V for which two operations are defined: 1) Vector addition: any vector x 1 and x 2 in set V can be added to another vector x = x 1 + x 2 and x is also in set V. 2) Scalar Multiplication: Any vector x in V can be multiplied ("scaled") by a real number c 2 R to produce a second vector cx which is also in V. n this class, we only discuss the case where V R n,meaningeachvector x is the space is a n-dimensional column vector.

Basic concepts - norm The model space and data space we mentioned in class are normed vector spaces. A norm is a function k k : R n! R that map a vector to a real number. A norm must satisfy the following: 1) kxk 0andkxk = 0i x = 0 2) kx + yk apple kxk + kyk 3) kaxk = a kxk where x and y are vectors in vector space V and a 2 R.

Basic concepts - norm We will see the following norm in this course: 1) L 2 norm: for a vector x, thel 2 norm is defined as: kxk 2 s nâ i=1 2) L 1 norm: for a vector x,the L 2 norm is defined as: kxk 1 n  i=1 x 2 i x i 3) L norm: for a vector x,the L norm is defined as: The norm for a matrix is induced as: kxk max i=1,,n x i A a = sup x6=0 Ax a x a

Basic concepts - linear mapping, range and null space We say a a map x! Ax is linear if for any x,y 2 R n,andanya 2 R, A(x + y)=ax + Ay A(ax)=aAx t can be proved that each linear mapping from R n to R m can be expressed by the multiplication of a m n matrix. The range of linear operator A 2 R m n,isthespacespannedbythe columns of A, range(a)={y such that y = Ax,x 2 R n } The null space of linear operator A 2 R m n is the space, null(a)={x such that Ax = 0} t is obvious that range(a) is perpendicular to null(a T ).(exercise)

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 1: b ij = m  k=1 a ik c kj Here b ij, a ik,andc kj are entries of B, A, C.

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 2: B =[b 1 b 2 b n ] Here b i is the i th column of matrix B. Then, B =[Ac 1 Ac 2 Ac n ] b i = Ac i Each column of B is in the range (we will talk about it later) of A. Thus, the range of B is the subset of the range of A.

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 3: 2 B = 6 4 Here b i is the i th row of matrix B. Then, This form is not commenly used. 2 B = 6 4 b T i b T 1 b T 2 b T l =ã T i C ã T 1 C ã T 2 C ã T l C 3 7 5 3 7 5

Basic concepts - four ways matrix multiplication For the matrix-matrix product B = AC.fA is l m and C is m n, then B is l n. matrix multiplication method 4: B = Â i,j=1,,m Where, a i is the i th column of matrix A, and c j T is the j th row of matrix C. Each term a i c j T is a rank-one matrix. a i c T j

Outline Basic Concepts on Linear Algbra vector space norm linear mapping, range, null space matrix multiplication terative Methods for Linear Optimization normal equation steepest descent conjugate gradient Unconstrainted Nonlinear Optimization Optimality condition Search direction Line search

Linear Optimization- normal equation We solve a linear system having n unknowns and with m > n equations. We want to find a vector m 2 R n that satisfies, where d 2 R m and F 2 R m n. Reformulate the problem: Fm = d define residual r = d Fm find m that minimizekrk 2 = kfm dk 2 t can be proved that, we can minimize the residual norm when F r = 0. This is equivalent to a n n system, F Fm = F d which is the normal equation. We can solve norm equation using direction methods such at LU, QR, SVD, Cholesky decomposition, etc.

Linear Optimization-steepest descent method For the unconstraint linear optimization problem: min J(m)=kFm dk 2 2 To find the minimum of objective function J(m) iteratively using steepest descent method, at the current point m k, we update the model by moving along the nagative direction of gradient, m k+1 = m k a J(m k ) J(m k )=F (Fm k d) The gradient can be evaluated exactly, and we have analytical formula for the optimal a.

Linear Optimization-conjugate gradient method For the unconstraint linear optimization problem: min J(m)=kFm dk 2 2 Starting from m 0, we have a series of search direction m i,i = 1,2,,k, and updated model iteratively,m i = m i 1 a i 1 m i 1, i = 1,,k. For the next search direction m k in the space span{ m 0,, m k 1, J(m k )}, k 1 m k = Â i=0 c i m i + c k J(m k ) The magic is that for linear problem c 0 = c 1 = = c k up with Conjugate gradient method, m k = c k 1 m k 1 + c k J(m k ) a k = min J(m k + a k m k ) m k+1 = m k + a k m k 2 = 0. We ended We are searching within the space span{ m 0,, m k 1, J(m k )} in CG method, though looks like we are doing a plane search.

Outline Basic Concepts on Linear Algbra vector space norm linear mapping, range, null space matrix multiplication terative Methods for Linear Optimization normal equation steepest descent conjugate gradient Unconstrainted Nonlinear Optimization Optimality condition Search direction Line search

Unconstrainted Nonlinear Optimization-Optimality condition For the unconstraint nonlinear optimization problem: minimize m J(m) where J(m) is a real-valued function. How should we determine if m is a local minimizer? Theorem (First order necessary conditions for a local minimum) J(m )=0 Theorem (Second order necessary conditions for a local minimum) s 2 J(m )s 0, 8s 2 R n

Unconstrainted Nonlinear Optimization-Search direction For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction real number, such that J(m k + a k m k ) < J(m k ). How do we choose the search direction m k? 1) Gradient based method, m k,anda J(m k + a k m k ) J(m k ) a k J(m k ) T m k + O(k m k k 2 2 ) Thus, m k = J(m k ) is a search direction. We can also use similar technique in CG method, m k = c 1 J(m k )+c 2 m k 1 where c 1,c 2 2 R.

Unconstrainted Nonlinear Optimization-Search direction For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction real number, such that J(m k + a k m k ) < J(m k ). How do we choose the search direction m k? 1) Methods based on a local quadratic model, J(m k + a k m k ) J(m k ) a k J(m k ) T m k + a 2 k We solve the approximated problem, minimize y(p k ) J(m k ) T p k + 1 2 p k 2 J(m k )p k p k = a k m k m k,anda 1 2 mt k 2 J(m k ) m k The approximated problem is a linear system and can be solved exactly. Then, update the model, m k+1 = m k + p k

Unconstrainted Nonlinear Optimization-Line search For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction m k,anda real number, such that J(m k + a k m k ) < J(m k ). How do we choose a k for a given search direction m k? Can we choose arbitrary a k such that J(m k + a k m k ) < J(m k )? The answer is no. For example, J(m)=m 2, m 2 R 1.Wecanfinda sequence, such that Then, m 0 = 2, m k = m k a k = 2 + 3 2 (k+1) 1 + 2 k m k =( 1) k (1 + 2 k ) 1 J(m k )= (1 + 2 k ) 2! 1

Unconstrainted Nonlinear Optimization-Line search For the unconstraint nonlinear optimization problem: minimize m J(m) Given a model point m k, we want to find a search direction m k,anda real number, such that J(m k + a k m k ) < J(m k ). How do we choose a k for a given search direction m k? A popular set of conditions that guarentee convergence named Wolfe condition: J(m k + a k m k ) apple J(m k )+c 1 a k J(m k ) T m k J(m k + a k m k ) T m k c 2 a k J(m k ) T m k where 0 < c 1 < c 2 < 1.

Reference Numerical Linear Algebra, by Lloyd N. Trefethen, David Bau. Numerical Optimization, by Jorge Nocedal, Stephen Wright. Lecture notes from Prof. Walter Murray, http://web.stanford.edu/class/cme304/