Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Similar documents
Chapter 7 Iterative Techniques in Matrix Algebra

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 17. a ij x (k) b i. a ij x (k+1) (D + L)x (k+1) = b Ux (k)

APPENDIX A Some Linear Algebra

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Deriving the X-Z Identity from Auxiliary Space Method

Singular Value Decomposition: Theory and Applications

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

Errors for Linear Systems

MATH Homework #2

1 GSW Iterative Techniques for y = Ax

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

IV. Performance Optimization

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Inexact Newton Methods for Inverse Eigenvalue Problems

Lecture Notes on Linear Regression

Norms, Condition Numbers, Eigenvalues and Eigenvectors

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Generalized Linear Methods

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Lecture 21: Numerical methods for pricing American type derivatives

Solution of Linear System of Equations and Matrix Inversion Gauss Seidel Iteration Method

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

MMA and GCMMA two methods for nonlinear optimization

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Math 217 Fall 2013 Homework 2 Solutions

4DVAR, according to the name, is a four-dimensional variational method.

DISCRIMINANTS AND RAMIFIED PRIMES. 1. Introduction A prime number p is said to be ramified in a number field K if the prime ideal factorization

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

for Linear Systems With Strictly Diagonally Dominant Matrix

Lecture 3. Ax x i a i. i i

P A = (P P + P )A = P (I P T (P P ))A = P (A P T (P P )A) Hence if we let E = P T (P P A), We have that

2.3 Nilpotent endomorphisms

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

1 Matrix representations of canonical matrices

Feature Selection: Part 1

Google PageRank with Stochastic Matrix

Feb 14: Spatial analysis of data fields

On a direct solver for linear least squares problems

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

Appendix B. The Finite Difference Scheme

Linear Approximation with Regularization and Moving Least Squares

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Appendix B. Criterion of Riemann-Stieltjes Integrability

332600_08_1.qxp 4/17/08 11:29 AM Page 481

Lecture 12: Discrete Laplacian

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Linear Feature Engineering 11

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

[7] R.S. Varga, Matrix Iterative Analysis, Prentice-Hall, Englewood Clis, New Jersey, (1962).

VQ widely used in coding speech, image, and video

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

form, and they present results of tests comparng the new algorthms wth other methods. Recently, Olschowka & Neumaer [7] ntroduced another dea for choo

Introduction to Simulation - Lecture 5. QR Factorization. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Statistical pattern recognition

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Eigenvalues of Random Graphs

A Parallel Multisplitting Solution of the Least Squares Problem

10-701/ Machine Learning, Fall 2005 Homework 3

Report on Image warping

Some basic inequalities. Definition. Let V be a vector space over the complex numbers. An inner product is given by a function, V V C

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Perron Vectors of an Irreducible Nonnegative Interval Matrix

Overlapping additive and multiplicative Schwarz iterations for H -matrices

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

CSCE 790S Background Results

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Homework Notes Week 7

Lecture 10: May 6, 2013

Pattern Classification

Quantum Mechanics I - Session 4

Formal solvers of the RT equation

EEE 241: Linear Systems

The lower and upper bounds on Perron root of nonnegative irreducible matrices

Robust Norm Equivalencies and Preconditioning

Estimation: Part 2. Chapter GREG estimation

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Section 8.3 Polar Form of Complex Numbers

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Least-Squares Solutions of Generalized Sylvester Equation with Xi Satisfies Different Linear Constraint

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

THE Hadamard product of two nonnegative matrices and

DIFFERENTIAL FORMS BRIAN OSSERMAN

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

6.854J / J Advanced Algorithms Fall 2008

Ballot Paths Avoiding Depth Zero Patterns

Transcription:

Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm on R n s a functon,, from R n nto R wth the propertes: () x 0 for all x R n () x = 0 f and only f x = 0 () αx = α x for all α R and x R n (v) x + y x + y for all x, y R n Defnton The Eucldean norm l 2 and the nfnty norm l for the vector x = (x 1, x 2,..., x n ) t are defned by x 2 = { =1 x 2 } 1/2 and x = max 1 n x Cauchy-Bunyakovsky-Schwarz Inequalty for Sums Dstances For each x = (x 1, x 2,..., x n ) t and y = (y 1, y 2,..., y n ) t n R n, x t y = { x y x 2 =1 =1 =1 } 1/2 { } 1/2 y 2 = x 2 y 2 Defnton The dstance between two vectors x = (x 1,..., x n ) t and y = (y 1,..., y n ) t s the norm of the dfference of the vectors. The l 2 and l dstances are { } 1/2 x y 2 = (x y ) 2 =1 x y = max 1 n x y Convergence Matrx Norms Defnton A sequence { k=1 of vectors n Rn s sad to converge to x wth respect to the norm f, gven any ε > 0, there exsts an nteger N(ε) such that x < ε, for all k N(ε) The sequence of vectors { } converges to x n R n wth respect to f and only f lm k = x. For each x R n, Defnton A matrx norm on n n matrces s a real-valued functon satsfyng () A 0 () A = 0, f and only f A = 0 () αa = α A (v) A + B A + B (v) AB A B x x 2 n x

Natural Matrx Norms If s a vector norm, the natural (or nduced) matrx norm s gven by Corollary A = max x =1 Ax For any vector z 0, matrx A, and natural norm, Az A z If A = (a ) s an n n matrx, then A = max 1 n =1 a Egenvalues and Egenvectors Defnton The characterstc polynomal of a square matrx A s p(λ) = det(a λi) Defnton The zeros λ of the characterstc polynomal are egenvalues of A, x 0 satsfyng (A λi)x = 0 s a correspondng egenvector. Defnton The spectral radus ρ(a) of a matrx A s ρ(a) = max λ, If A s an n n matrx, then () A 2 = [ρ(a t A)] 1/2 () ρ(a) A, for any natural norm for egenvalues λ of A Convergent Matrces Iteratve Methods for Lnear Systems Defnton An n n matrx A s convergent f lm k (Ak ) = 0, for each = 1, 2,..., n and = 1, 2,..., n The followng statements are equvalent. () A s a convergent matrx () lm n A n = 0, for some natural norm () lm n A n = 0, for all natural norms (v) ρ(a) < 1 (v) lm n A n x = 0, for every x Drect methods for solvng Ax = b, e.g. Gaussan elmnaton, compute an exact soluton after a fnte number of steps (n exact arthmetc) Iteratve algorthms produce a sequence of approxmatons x (1), x (2),... whch hopefully converges to the soluton, and may requre less memory than drect methods may be faster than drect methods may handle specal structures (such as sparsty) n a smpler way Resdual r = bax 10 0 10 5 10 10 10 15 Iteratve Drect 0 5 10 15 20 25 30 Iteraton Two Classes of Iteratve Methods Jacob s Method Statonary methods (or classcal teratve methods) fnds a splttng A = M K and terates = M 1 (Kx (k1) + b) = T x (k1) + c Jacob, Gauss-Sedel, Successve Overrelaxaton (SOR) Krylov subspace methods use only multplcaton by A (and possbly by A T ) and fnd solutons n the Krylov subspace {b, Ab, A 2 b,..., A k1 b} Conugate Gradent (CG), Generalzed Mnmal Resdual (GMRES), BConugate Gradent (BCG), etc An teratve technque to solve Ax = b starts wth an ntal approxmaton x (0) and generates a sequence of vectors { } k=0 that converges to x. Jacob s Method Solve for x n the the th equaton of Ax = b: x = =1 Ths leads to the teraton = 1 ( a =1 ( a ) x + b, a a a x (k1) ) + b, for = 1, 2,..., n for = 1, 2,..., n

Matrx form of Jacob s Method Convert Ax = b nto an equvalent system x = T x + c, select ntal vector x (0) and terate = T x (k1) + c For Jacob s method, splt A nto dagonal and off-dagonal parts: a11 a12 a1n a11 0 0 0 0 0 a12 a1n a21 a22 a2n..... = 0 a22...... a21......................... 0.......... an1,n an1 an2 ann 0 0 ann an1 a n,n1 0 0 0 }{{}}{{}}{{}}{{} A D L U Ths transforms Ax = (D L U)x = b nto Dx = (L + U)x + b, and f D 1 exsts, ths leads to the Jacob teraton: = D 1 (L + U)x (k1) + D 1 b = T x (k1) + c where T = D 1 (L + U) and c = D 1 b The Gauss-Sedel Method The Gauss-Sedel Method Improve Jacob s method by, for > 1, usng the already updated components 1,..., x(k) 1 when computng x(k) : = 1 1 (a ) (a x (k1) ) + b a =1 =+1 In matrx form, the method can be wrtten (D L) = Ux (k1) + b and f (D L) 1 exsts, ths leads to the Gauss-Sedel teraton = (D L) 1 Ux (k1) + (D L) 1 b = T g x (k1) + c g where T g = (D L) 1 U and c g = (D L) 1 b General Iteraton Methods Lemma If the spectral radus satsfes ρ(t ) < 1, then (I T ) 1 exsts, and (I T ) 1 = I + T + T 2 + = For any x (0) R n, the sequence = T x (k1) + c =0 converges to the unque soluton of x = T x + c f and only f ρ(t ) < 1. T General Iteraton Methods Corollary If T < 1 for any natural matrx norm, then = T x (k1) + c converges for any x (0) R n to a vector x R n s.t. x = T x + c. The followng error estmates hold: 1 x T k x (0) x 2 x T k 1 T x(1) x (0) A strctly dagonally domnant = Jacob and Gauss-Sedel converges for any x (0). (Sten-Rosenberg) If a > 0 for all and a < 0 for, then one and only one of the followng holds: () 0 ρ(t g ) < ρ(t ) < 1 () 1 < ρ(t ) < ρ(t g ) () ρ(t ) = ρ(t g ) = 0 (v) ρ(t ) = ρ(t g ) = 1 The Resdual Vector Defnton The resdual vector for x R n wth respect to the lnear system Ax = b s r = b A x. Consder the approxmate soluton vector n Gauss-Sedel: wth resdual vector = ( 1, x(k) 2,..., x(k) r (k) The Gauss-Sedel method: = 1 1 b a can then be wrtten as 1, x(k1) = (r (k) 1, r(k) 2,..., r(k) n )t =1 a = x (k1) =+1 + r(k) a,..., x (k1) n ) t a x (k1) Successve Over-Relaxaton The relaxaton methods uses an teraton of the form = x (k1) + ω r(k) a for some postve ω. Wth ω > 1, they can accelerate the convergence of the Gauss-Sedel method, and are called successve over-relaxaton (SOR) methods. Wrte the SOR method as = (1 ω)x (k1) + ω a 1 b =1 whch can be wrtten n the matrx form a = T ω x (k1) + c ω where T ω = (D ωl) 1 [(1 ω)d + ωu] and c ω = ω(d ωl) 1 b. =+1 a x (k1)

Convergence of the SOR Method (Kahan) If a 0 for all, then ρ(t ω ) ω 1 and the SOR method can converge only f 0 < ω < 2. (Ostrowsk-Rech) If A s PD and 0 < ω < 2, then SOR converges for any x (0). If A s PD and trdagonal, then ρ(t g ) = [ρ(t )] 2 < 1, and the optmal ω for SOR s whch gves ρ(t ω ) = ω 1. 2 ω = 1 + 1 [ρ(t )] 2 Error Bounds Suppose Ax = b, A s nonsngular, x x, and r = b A x. Then for any natural norm, and f x, b 0, Defnton x x r A 1 x x x A A 1 r b The condton number of nonsngular matrx A n the norm s K(A) = A A 1 In terms of K(A), the error bounds can be wrtten: x x K(A) r A, x x K(A) r x b Iteratve Refnement Errors n both matrx and rght-hand sde Algorthm: Iteratve Refnement Solve Ax (1) = b r (k) = b A Solve Ay (k) = r (k) x (k+1) = + y (k) resdual compute accurately! solve for correcton mprove soluton Allows for errors n the soluton of the lnear systems, provded the resdual r s computed accurately Suppose A s nonsngular and The soluton x to δa < 1 A 1 (A + δa) x = b + δb approxmates the soluton x of Ax = b wth the error estmate ( x x K(A) A δb x A K(A) δa b + δa ) A Inner products Krylov Subspace Algorthms Defnton The nner product for n-dmensonal vectors x, y s x, y = x t y For any vectors x, y, z and real number α: (a) x, y = y, x (b) αx, y = x, αy = α x, y (c) x + z, y = x, y + z, y (d) x, x 0 (e) x, x = 0 x = 0 Create a sequence of Krylov subspaces for Ax = b: K k = {b, Ab,..., A k1 b} and fnd approxmate solutons x k n K k Only matrx-vector products nvolved For SPD matrces, the most popular algorthm s the Conugate Gradents method [Hestenes/Stefel, 1952] Fnds the best soluton x k K k n the norm x A = x t Ax Only requres storage of 4 vectors (not all the k vectors n K k ) Remarkably smple and excellent convergence propertes Orgnally nvented as a drect algorthm! (converges after n steps n exact arthmetc)

The Conugate Gradents Method Propertes of Conugate Gradents Vectors Algorthm: Conugate Gradents Method x 0 = 0, r 0 = b, p 0 = r 0 α k = (rk1 t r k1)/(p t k1 Ap k1) x k = x k1 + α k p k1 r k = r k1 α k Ap k1 β k = (rk t r k)/(rk1 t r k1) p k = r k + β k p k1 step length approxmate soluton resdual mprovement ths step search drecton Only one matrx-vector product Ap k1 per teraton Operaton count O(n) (excludng the matrx-vector product) The spaces spanned by the solutons, the search drectons, and the resduals are all equal to the Krylov subspaces: K k = span ({x 1, x 2,..., x k }) = span ({p 0, p 1,..., p k1 }) ({ }) = span ({r 0, r 1,..., r k1 }) = span b, Ab,..., A k1 b The resduals are orthogonal: r t k r = 0 ( < k) The search drectons are A-conugate: p t k Ap = 0 ( < k) Optmalty of Conugate Gradents The errors e k = x x k are mnmzed n the A-norm Proof. For any other pont x = x k x K k the error s e 2 A = (e k + x) t A(e k + x) = e t k Ae k + ( x) t A( x) + 2e t k A( x) But e t k A( x) = rt k ( x) = 0, snce r k s orthogonal to K k, so x = 0 mnmzes e A Monotonc: e k A e k1 A, and e k = 0 n k m steps Proof. Follows from K k K k+1, and that K k R m unless converged Optmzaton n CG CG can be nterpreted as a mnmzaton algorthm We know t mnmzes e A, but ths cannot be evaluated CG also mnmzes the quadratc functon ϕ(x) = 1 2 xt Ax x t b: e k 2 A = e t k Ae k = (x x k ) t A(x x k ) = x t k Ax k 2x t k Ax + x t Ax = x t k Ax k 2x t k + xt b = 2ϕ(x k ) + constant At each step α k s chosen to mnmze x k = x k1 + α k p k1 The conugated search drectons p k gve mnmzaton over all of K k Optmzaton by Conugate Gradents The Method of Steepest Descent We know that solvng Ax = b s equvalent to mnmzng the quadratc functon ϕ(x) = 1 2 xt Ax x t b The mnmzaton can be done by lne searches, where ϕ(x k ) s mnmzed along a search drecton p k Very smple approach: Set search drecton p k to the negatve gradent r k Corresponds to movng n the drecton ϕ(x) changes the most The α k+1 that mnmzes ϕ(x k + α k+1 p k ) s wth the resdual r k = b Ax k pt k r k α k+1 = p t k Ap k Algorthm: Steepest Descent x 0 = 0, r 0 = b α k = (rk1 t r k1)/(rk1 t Ar k1) x k = x k1 α k r k1 r k = r k1 + α k Ar k1 step length approxmate soluton resdual The resdual s also mnus the gradent of ϕ(x k ): ϕ(x k ) = Ax k b = r k Poor convergence, tends to move along prevous search drectons

The Method of Conugate Drectons The optmzaton can be mproved by better search drectons Let the search drecton be A-conugate, or p t Ap k = 0 Then the algorthm wll converge n at most n steps, snce the ntal error can be decomposed along the p s: n1 e 0 = δ k p k, wth δ k = pt k Ae 0 p t k Ap k k=0 But ths s exactly the α we choose at step k: pt k r k α k+1 = p t k Ap k = pt k Ae k p t k Ap = pt k Ae 0 k p t k Ap k snce the error e k s the ntal e 0 plus a combnaton of p 0,..., p k1, whch are all A-conugate to p k. Each component δ k s then subtracted out at step k, and the method converges after n steps. Choosng A-conugate Search Drectons One method to choose p k whch s A-conugate to prevous search vectors s by Gram-Schmdt: k1 p k = p 0 k β k p, wth β k = p0 t k Ap p t Ap =0 The ntal p 0 k vectors should be lnearly ndependent, for example column k + 1 of dentty matrx Drawback: Must store all prevous search vectors p k Conugate Gradents s smply Conugate Drectons wth a partcular ntal vector n Gram-Schmdt: p 0 k = r k Ths gves orthogonal resduals r t k r = 0 for k, and β k = 0 for k > + 1 Precondtoners for Lnear Systems Precondtoned Conugate Gradents Man dea: Instead of solvng Ax = b solve, usng a nonsngular n n precondtoner M, whch has the same soluton x M 1 Ax = M 1 b Convergence propertes based on M 1 A nstead of A Trade-off between the cost of applyng M 1 and the mprovement of the convergence propertes. Extreme cases: M = A, perfect condtonng of M 1 A = I, but expensve M 1 M = I, do nothng M 1 = I, but no mprovement of M 1 A = A To keep symmetry, solve (C 1 AC )C x = C 1 b wth CC = M Can be wrtten n terms of M 1 only, wthout reference to C: Algorthm: Precondtoned Conugate Gradents Method x 0 = 0, r 0 = b, p 0 = M 1 r 0, z 0 = p 0 α k = (rk1 T z k1)/(p T k1 Ap k1) x k = x k1 + α k p k1 r k = r k1 α k Ap k1 z k = M 1 r k β k = (rk T z k)/(rk1 T z k1) p k = z k + β k p k1 step length approxmate soluton resdual precondtonng mprovement ths step search drecton Commonly Used Precondtoners A precondtoner should approxmately solve the problem Ax = b Jacob precondtonng - M = dag(a), very smple and cheap, mght mprove certan problems but usually nsuffcent Block-Jacob precondtonng - Use block-dagonal nstead of dagonal. Another varant s usng several dagonals (e.g. trdagonal) Classcal teratve methods - Precondton by applyng one step of Jacob, Gauss-Sedel, SOR, or SSOR Incomplete factorzatons - Perform Gaussan elmnaton but gnore fll, results n approxmate factors A LU or A R T R (more later) Coarse-grd approxmatons - For a PDE dscretzed on a grd, a precondtoner can be formed by transferrng the soluton to a coarser grd, solvng a smaller problem, then transferrng back (multgrd)