Introduction to Numerical Analysis

Similar documents
LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Scientific Computing: Dense Linear Systems

Computational Methods. Systems of Linear Equations

Linear Algebraic Equations

Scientific Computing

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

The Solution of Linear Systems AX = B

Solving Linear Systems of Equations

Outline. Math Numerical Analysis. Errors. Lecture Notes Linear Algebra: Part B. Joseph M. Mahaffy,

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Mathematics for Engineers. Numerical mathematics

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Process Model Formulation and Solution, 3E4

Next topics: Solving systems of linear equations

Chapter 2 - Linear Equations

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

5. Direct Methods for Solving Systems of Linear Equations. They are all over the place...

Chapter 7 Iterative Techniques in Matrix Algebra

5 Solving Systems of Linear Equations

Linear Algebra Massoud Malek

Numerical Methods - Numerical Linear Algebra

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

LECTURES IN BASIC COMPUTATIONAL NUMERICAL ANALYSIS

Math 411 Preliminaries

Review Questions REVIEW QUESTIONS 71

Linear Least-Squares Data Fitting

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Introduction. Chapter One

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

Lecture Note 2: The Gaussian Elimination and LU Decomposition

Numerical Analysis: Solving Systems of Linear Equations

MODULE 7. where A is an m n real (or complex) matrix. 2) Let K(t, s) be a function of two variables which is continuous on the square [0, 1] [0, 1].

A Review of Matrix Analysis

Roundoff Analysis of Gaussian Elimination

you expect to encounter difficulties when trying to solve A x = b? 4. A composite quadrature rule has error associated with it in the following form

Solving Linear Systems of Equations

Dense LU factorization and its error analysis

14.2 QR Factorization with Column Pivoting

TABLE OF CONTENTS INTRODUCTION, APPROXIMATION & ERRORS 1. Chapter Introduction to numerical methods 1 Multiple-choice test 7 Problem set 9

AMS526: Numerical Analysis I (Numerical Linear Algebra)

NUMERICAL MATHEMATICS & COMPUTING 7th Edition

Lecture 4: Linear Algebra 1

A Review of Linear Algebra

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Scientific Computing: Solving Linear Systems

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

Scientific Computing: An Introductory Survey

CS227-Scientific Computing. Lecture 4: A Crash Course in Linear Algebra

lecture 2 and 3: algorithms for linear algebra

Numerical Methods I Non-Square and Sparse Linear Systems

Iterative solvers for linear equations

LINEAR SYSTEMS (11) Intensive Computation

Iterative solvers for linear equations

Roundoff Error. Monday, August 29, 11

Solution of Linear Equations

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Numerical Linear Algebra

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

Elementary Linear Algebra

Linear System of Equations

1 Error analysis for linear systems

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Conjugate Gradient (CG) Method

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Part IB - Easter Term 2003 Numerical Analysis I

Direct Methods for Solving Linear Systems. Matrix Factorization

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

Course Notes: Week 1

Algebra C Numerical Linear Algebra Sample Exam Problems

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

Lecture 6. Numerical methods. Approximation of functions

CS 323: Numerical Analysis and Computing

Computational Linear Algebra

ECE133A Applied Numerical Computing Additional Lecture Notes

2.1 Gaussian Elimination

nonlinear simultaneous equations of type (1)

Introduction to PDEs and Numerical Methods Lecture 7. Solving linear systems

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Gaussian Elimination and Back Substitution

Engineering Computation

lecture 3 and 4: algorithms for linear algebra

Practical Linear Algebra: A Geometry Toolbox

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

MAT 610: Numerical Linear Algebra. James V. Lambers

4.2 Floating-Point Numbers

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Linear Solvers. Andrew Hazel

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Applied Linear Algebra in Geoscience Using MATLAB

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Computational Economics and Finance

Transcription:

Université de Liège Faculté des Sciences Appliquées Introduction to Numerical Analysis Edition 2015 Professor Q. Louveaux Department of Electrical Engineering and Computer Science Montefiore Institute

ii

Contents 1 Introduction 1 2 Interpolation and Regression 3 2.1 Approximation.......................... 4 2.1.1 Linear regression..................... 4 2.1.2 Non-linear regression................... 7 2.1.3 Choice of the functions basis............... 8 2.1.4 Polynomial regression................... 9 3 Linear Systems 15 3.1 Direct methods.......................... 15 3.1.1 Triangular systems.................... 16 3.1.2 Gaussian elimination................... 17 3.1.3 Algorithmic complexity of Gaussian elimination.... 18 3.1.4 Pivot selection...................... 19 3.1.5 LU decomposition.................... 21 3.2 Error in linear systems...................... 25 3.2.1 Vector and matrix norms................. 26 3.2.2 Effect of the perturbations in the data......... 28 3.2.3 Rounding errors for Gaussian elimination........ 30 3.2.4 Scale change and equation balancing.......... 34 3.3 Iterative methods......................... 37 3.3.1 Jacobi and Gauss-Seidel methods............ 38 3.3.2 Convergence of iterative methods............ 41 3.4 Eigenvalues............................ 44 3.4.1 Power method....................... 44 3.4.2 Eigenvalue of lowest modulus.............. 46 3.4.3 Computation of other eigenvalues............ 47 iii

iv CONTENTS 3.4.4 QR algorithm....................... 48 3.5 Linear optimisation........................ 51 3.5.1 Standard form of linear programming.......... 54 3.5.2 Polyhedra geometry.................... 58 3.5.3 Simplex algorithm.................... 65 4 Non-linear Systems 71 4.1 Fixed-point method for systems................. 71 4.2 Newton method for systems................... 76 4.3 Quasi-Newton method...................... 78 5 Numerical Differentiation and Integration 83 5.1 Mathematical background.................... 83 5.1.1 Taylor s theorem..................... 83 5.1.2 Polynomial interpolation................. 84 5.2 Differentiation........................... 86 5.2.1 First-order naive method................. 87 5.2.2 Central differences.................... 88 5.2.3 Forward and backward differences............ 92 5.2.4 Higher-order derivatives................. 93 5.2.5 Error estimation..................... 94 5.3 Richardson extrapolation..................... 96 5.3.1 Richardson extrapolation................. 97 5.3.2 Application to numerical differentiation......... 99 5.4 Numerical integration....................... 101 5.4.1 Newton-Cotes quadrature rules............. 101 5.4.2 Composite rules...................... 104 5.4.3 Error analysis....................... 105 5.4.4 Romberg s method.................... 108 5.4.5 Gauss-Legendre quadrature............... 109

Chapter 1 Introduction The increase of performance of computers in the last decades has changed dramatically how we can handle scientific problems. Up to the 1970 s, most scientific and engineering problems were essentially tackled by performing lengthy calculations by hand or by looking at handy graphical methods. Nowadays any scientific problem is solved using a computer. This has had a great impact on the variety but also on the size of the problems that can be solved. The goal of this lecture is to provide some essential tools to understand the basic methods that are used to solve scientific problems. In particular we will focus on a few basic issues that are taken as representatives of important problems in the area of scientific computing. For example, solving a linear system is one of the basic elements that we can find in many more complex problems, often as a subroutine. In this lecture, we will not cover the techniques in the detail needed to be able to write a competitive code for each specific problem. On the other hand, for each considered problem, we will analyze two building blocks that are important even for more complex methods : a deep theoretical study often related to the error made when even if the precision of the calculation is perfect, and a more practical study related to the actual error made with computers or to common aspects like the sparsity of a matrix. In the first chapter we will consider a problem related to the approximation of an unknown function. Approximating a function that is given by a few data points is an operation that is very common in engineering. Indeed in many cases, the problems that we consider are too complicated in order for the engineer or the scientist to be able to provide a full description by equa- 1

2 CHAPTER 1. INTRODUCTION tions. A good option is then to analyze the results provided by experiments. Another reason to analyze the data has appeared more recently. Indeed the fact that many systems are more and more automatic, a large quantity of data can often be readily collected. It is then very useful to analyze them. The technique we will cover in Chapter 2 are related to such a data analysis. Chapter 3 is the largest part of the lecture and deals with linear algebra in the broad sense. The core of the chapter is devoted to solving linear systems of equations. Solving a linear system is a building block of many more complicated algorithms, like for example solving non-linear systems and that is why it is so important. In Chapter 3, we also cover the numerical computation of eigenvalues. We finally cover the solution of linear systems of inequalities leading to the simplex algorithm. Chapter 4 shows how to numerically solve nonlinear systems of equations. Finally Chapter 5 describes how to evaluate the derivative or an integral of a function numerically.

Chapter 2 Interpolation and Regression An important feature that numerical analysis algorithms need is to approximate functions that are only given by a few points. The importance comes from the fact that either a few points are given by experiments or it is simply useful to approximate a function by an easier variant of it. The most well-known example is the Taylor expansion of a function which consists of a simple polynomial approximation of any differentiable function. In the first numerical analysis lecture, we have considered polynomial interpolation. Interpolating a polynomial consists in finding a polynomial p(x) that satisfies p(x i ) = u(x i ) for a list of n pairs (x i, u(x i )). For a given list of n pairs with pairwise distinct x i values, there exists a unique polynomial of degree at most n 1 that interpolates these n pairs exactly. The main drawback of polynomial interpolation is that it behaves very badly when the number of points to interpolate increases. This leads to a polynomial of high degree that presents high oscillations, very often at the boundaries of the interval of interpolation. This implies that the interpolated polynomial is quite bad at generalizing the points and is therefore impossible to use for any purpose. This phenomenon is called overfitting. An example of the typical behavior of the interpolating polynomial is shown in Figure 2.1. We see that the polynomial interpolation performs very badly in terms of generalization of the data and includes unwanted oscillations. In the following, we show how we can avoid such a behavior by considering low-degree approximations of the points. 3

4 CHAPTER 2. INTERPOLATION AND REGRESSION 15 10 p(x) 5 0 5 0 1 2 3 4 5 6 7 8 9 10 x Figure 2.1: An example of overfitting 2.1 Approximation Interpolation imposes that the function passes exactly through the points (x 1, u(x 1 )),..., (x n, u(x n )). In some cases, this behavior is necessary, but it is not always the case: what if the data contains errors? This situation happens quite often, for example when trying to predict a phenomenon for which only experimental data are available: in essence, those measures are imprecise, and the errors should be smoothed out. This is exactly the goal of approximation. 2.1.1 Linear regression Consider a point cloud as in Figure 2.2. To the naked eye, it seems those points follow a relationship that is more or less linear. Polynomial interpolation and cubic spline interpolation are represented in Figure 2.3. Clearly, those results are less than satisfactory for predictions: these curves depend a lot on the particular measures of the point cloud. Linear regression will try to find a linear model that provides the best description of the point cloud. More precisely, starting from the points (x 1, u(x 1 )),..., (x n, u(x n )), linear regression attempts to find the coefficients a and b of a straight line y = ax+b such that ax i + b u(x i ) for all i. As for polynomial interpolation, with two points, those coefficients can be computed so that the equality is satisfied. However, in general, with more than two points, this regression will induce some error for each point i, defined as e i = ax i + b u(x i ). This error must

2.1. APPROXIMATION 5 4 3 2 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 Figure 2.2: A point cloud (x i, u(x i )) 4 3 2 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 Figure 2.3: Polynomial interpolation (solid line) and cubic spline (dotted line) interpolating the point cloud

6 CHAPTER 2. INTERPOLATION AND REGRESSION then be minimized according to some criterion. The most common one is rather easy to use: it takes a and b such that the sum of the square of the errors is minimized, i.e. it minimizes E(a, b) := n i=1 e2 i. This criterion has no obvious shortcoming, as positive and negative deviations are accounted for as positive errors. It also penalizes more larger residuals. As a consequence, the resulting line typically passes between the points. Nevertheless, this criterion is not always the right choice, especially when some measurements have very large errors which should then be discarded. These points are called outliers. The coefficients a and b minimize the total square error function: E(a, b) = n (ax i + b u(x i )) 2. i=1 It is twice continuously differentiable. A necessary condition to find its minimum is thus to zero its gradient: E(a, b) a = 0, E(a, b) b = 0. As a consequence, n 2x i (ax i + b u(x i )) = 0 i=1 n 2 (ax i + b u(x i )) = 0. i=1 As a and b are variables, the system is actually linear in those variables. Therefore, it can be rewritten as n i=1 n i=1 x 2 i x i n i=1 n x i ( a b ) = n x i u(x i ) n. u(x i ). These equations are called normal equations. It is possible to prove that their solution actually minimizes the function E(a, b). When applied on this section s example, this technique yields the line in Figure 2.4. i=1 i=1

2.1. APPROXIMATION 7 4 3 2 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 Figure 2.4: Linear regression for the point cloud 2.1.2 Non-linear regression The previous method can be applied to any set of base functions, as long as they are linearly independent. The same approach can then be used: zero the partial derivatives of the error function for each parameter. For example, we can find the best coefficients a 1,..., a m such that the function φ(x) = m a j φ j (x) j=1 is the best approximation of the points (x i, u(x i )). The hypothesis is that the functions φ 1 (x),..., φ m (x) are linearly independent. The error function is defined as ( n m 2 E(a 1,..., a m ) := a j φ j (x i ) u(x i )). To minimize the total square error E, the normal equations are i=1 j=1 E(a 1,..., a m ) a 1 = 0,..., E(a 1,..., a m ) a m = 0.

8 CHAPTER 2. INTERPOLATION AND REGRESSION Computing the partial derivatives gives ( E(a 1,..., a m ) n m ) = 2 φ 1 (x i ) a j φ j (x i ) u(x i ) a 1 E(a 1,..., a m ) a m = 2. i=1 j=1 ( n m ) φ m (x i ) a j φ j (x i ) u(x i ). i=1 Therefore the complete system of normal equations is ( m n ) n φ 1 (x i )φ j (x i ) a j = φ 1 (x i )u(x i ) (2.1) j=1 i=1 j=1 i=1. ( m n ) φ m (x i )φ j (x i ) a j = j=1 i=1 n φ m (x i )u(x i ). (2.2) i=1 2.1.3 Choice of the functions basis A common choice for the basis functions in non-linear regression is polynomials. For example, quadratic regression seeks the best second-order polynomial to approximate a point cloud. When the degree of the polynomials tends to infinity, the function will interpolate exactly the point cloud. However we have seen that it can be dangerous to use high-order polynomials, due to their unwanted oscillatory behavior. Hence practitioners usually prefer low-degree polynomials to approximate phenomena. Even though approximation through high-degree polynomials is discouraged, this section will dive into numerical resolution of the normal equations when many functions are used. In fact, if the functions basis is not chosen carefully enough, the system of normal equations may be ill-conditioned, and cause numerical problems. To approximate a point cloud, the most natural choice for a fifth-degree polynomial is φ(x) = a 5 x 5 + a 4 x 4 + a 3 x 3 + a 2 x 2 + a 1 x + a 0.

2.1. APPROXIMATION 9 1 0.9 0.8 0.7 0.6 0.5 x 0.4 x 2 x 3 0.3 x 4 0.2 x 5 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.5: The five basis monomials get closer and closer when degree increases Equivalently, this expression considered six basis functions: x 5, x 4,..., x, 1. However, in general, such a basis is a bad choice for numerical reasons. Indeed, considering those five polynomials in the interval [0, 1] clearly shows, in Figure 2.5, that they are very similar to each other. In general, with such a choice of basis functions, the linear system will have a determinant too close to zero. The chapter on linear systems explain why those equations are illconditioned and tedious to solve. On the other hand, orthogonal polynomials avoid such problems. Many families of such polynomials can be used, such as the Chebyshev polynomials, already presented in the numerical methods course. The five first ones are depicted in Figure 2.6. These polynomials are less similar than the natural choice of monomials. 2.1.4 Polynomial regression The previous sections showed how to find the best function approximating a point cloud when the general expression of the sought function is known. Some applications need the best polynomial without a priori knowledge of the appropriate degree. In this case, the normal equations must be solved several times efficiently in order to determine the best degree. This section

10 CHAPTER 2. INTERPOLATION AND REGRESSION 1 0.8 0.6 T 1 0.4 0.2 0 T 2 0.2 0.4 T 3 0.6 T 4 T 5 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.6: The five first Chebyshev polynomials are less akin one to another presents a way of doing so using orthogonal polynomials. It will first show that computing the variance of the error can help in determining the best degree. Then, it will define the needed type of orthogonal polynomials and show how their usage simplifies successive computations. With n experimental points known, a polynomial of degree n 1 will go through all those points exactly. Such a polynomial likely has unwanted oscillations and it is very often preferable to compute a polynomial with a lower degree that does the best approximation of these points. However, if the degree is too low, the solution to the normal equations can also be less than satisfactory. Theoretically, it is possible to compute all solutions for degrees less than n 1, which gives a sequence of polynomials q j (x) whose degree is j. For each such polynomial of degree j < n, the variance can be defined as σj 2 = 1 n (u(x i ) q j (x i )) 2. (2.3) n i=1 Theoretical statistics prove that those variances have a monotonic evolution: σ 2 0 > σ 2 1 > σ 2 2 > > σ 2 n 1 = 0.

2.1. APPROXIMATION 11 To find the best degree, an iterative process can consider that, as long as the jth-degree polynomial is not satisfying, the inequality σ 2 j+1 σ 2 j holds. However, if σ 2 j+1 σ 2 j, considering higher degrees than j has little interest. To determine the optimal degree, the solution is thus to compute the successive variances σ 2 0, σ 2 1,... and to stop as soon as the variance no moreinner product decreases significantly. The theory of orthogonal polynomials allows us to compute these variances very quickly. Definition 2.1 The inner product of two polynomials f, g is an operation satisfying the following properties: (i) f, g = g, f (ii) f, f 0 and f, f = 0 = f = 0 (iii) af, g = a f, g for all a R (iv) f, g + h = f, g + f, h Fixing a set of abscissas x 1,..., x n, the following operation defines an inner product between two polynomials p and q: n p(x i )q(x i ). (2.4) i=1 It is easy to check that this operation satisfies all four properties of Definition 2.1, except (ii). However, when limiting its application to polynomials of degree less than or equal to n 1, property (ii) is also satisfied. In the remainder of this section, the polynomial inner product will thus be defined as f, g = n i=1 p(x i)q(x i ). As a consequence, it is now possible to define a set of orthogonal polynomials. Definition 2.2 The set of polynomials (p 0,..., p t ) is a system of orthogonal polynomials if for all i j. p i, p j = 0 This definition is valid for all inner products. It can be used to build a family of orthogonal polynomials using the following recurrence formula.

12 CHAPTER 2. INTERPOLATION AND REGRESSION Proposition 2.1 The recurrence p 0 (x) = 1 p 1 (x) = x α 0 p i+1 (x) = xp i (x) α i p i (x) β i p i 1 (x) pour i 1, where α i = xp i, p i p i, p i, β i = xp i, p i 1 p i 1, p i 1 generates (p 0,..., p k ), a family of orthogonal polynomials, for all k. The proof is left as an exercise to the reader. In particular, this proposition allows us to perform all interesting operations in the context of polynomial regression efficiently. Back to the least square problem, and in particular to the normal equations (2.1)-(2.2), using the previous definition of the inner product, it can be rewritten with p 0,..., p k as basis functions: p 0, p 0 p 0, p 1 p 0, p k p 1, p 0 p 1, p 1 p 1, p k..... p k, p 0 p k, p 1 p k, p k a 0 a 1. a k = u, p 0 u, p 1. u, p k. (2.5) However if we choose basis functions that are orthogonal, the system becomes even simpler: p 0, p 0 0 0 a 0 u, p 0 0 p 1, p 1 0 a 1...... = u, p 1.. 0 0 p k, p k a k u, p k The kth-order polynomial is then given by q k (x) = k i=0 a ip i (x) and a i = u, p i p i, p i. (2.6)

2.1. APPROXIMATION 13 This expression shows a somewhat important property: the coefficients a i do not depend on the number of polynomials in the basis. This would not happen if the basis was not orthogonal. In particular, the polynomial interpolation of u is n 1 q n 1 (x) = a i p i (x). (2.7) i=1 The various least-squares approximations of lower degree can be derived from this exact same sum (2.7), simply using truncation. The next step is now to compute the successive variances σ 2 0, σ 2 1,... To this end, the following result is interesting. Proposition 2.2 The set of polynomials (p 0,..., p j, u q j ) is orthogonal for all 0 j n 1. Proof: It is enough to prove that the last added polynomial, u q j, is orthogonal to all the others, since the first j + 1 are orthogonal by definition. We obtain u q j, p k = u, p k j a i p i, p k i=0 = u, p k a k p k, p k = 0 where the last equality is due to (2.6). Proposition 2.3 The successive variances are given by σk 2 = 1 k ( u, u n Proof: The definition of variance gives i=0 u, p i 2 p i, p i ). σ 2 k = 1 n n (u(x i ) q k (x i )) 2 i=1

14 CHAPTER 2. INTERPOLATION AND REGRESSION which can be rewritten as σ 2 k = 1 n u q k, u q k = 1 n ( u q k, u u q k, q k ). Using Proposition 2.2 and the fact that q k = k i=0 a ip i, it comes that u q k, q k = 0. As a consequence, σ 2 k = 1 n ( u, u q k, u ) = 1 k n ( u, u a i p i, u ) i=0 = 1 n ( u, u k i=0 p i, u 2 p i, p i ). Once again, variance computation does not depend on the final degree. They can thus be computed successively until they no more decrease significantly. In this case, we can then consider that the degree of the polynomial is satisfying.

Chapter 3 Linear Systems Linear operators are among the simplest mathematical operators, and hence a very natural model for engineers. All mathematical problems arising from a linear structure involve linear algebra operations. These problems are frequent: some people estimate that seventy-five percent of scientific computations use linear systems. It is thus very important to be able to solve such problems quickly with a high precision. Linear algebra is one of the best examples of the differences between classical mathematics and numerical analysis: even though the theory has been known for centuries, numerical algorithms only appeared during the few last decades. Classical rules such as Cramer s are particularly ill-suited to numerical operations: denoting by n the dimension of the problem, the rule performs n! operations, while Gaussian elimination only use on the order of n 3. Similarly, inverting a matrix will be done very rarely to solve a linear systems: the number of operations for doing so is often too high when compared to the number of operations actually required for usual problems. 3.1 Direct methods to solve linear systems A direct method to solve a linear system of equations is a method that gives the exact solution after a finite number of steps, ignoring rounding errors. For a system Ax = b where the matrix A is dense (meaning many of its elements are non-zero), there is no better algorithm, when comparing either time complexity or numerical precision, than the systematic Gaussian elimination. 15

16 CHAPTER 3. LINEAR SYSTEMS However, when the matrix A is sparse (many of its elements are zero), iterative methods offer certain advantages, and become very competitive for very large systems. They only offer approximate solutions, converging toward the solution when the number of steps tends to infinity. For systems having a special structure, iterative methods can give useful results with much fewer operations than direct methods. The choice between a direct and an iterative method depends on the proportion and repartition of non-zero elements in A. This is a very important topic, as most of the practical matrices are sparse but is outside of the scope of this lecture. 3.1.1 Triangular systems A linear system of equations whose matrix is triangular is particularly simple to solve. Consider a linear system Lx = b whose matrix L = [l ij ] is lower triangular. With the assumption that l ii 0, i = 1, 2,..., n, the unknowns can be determined in direct order x 1, x 2,..., x n using the following formula. x i = i 1 b i l ik x k k=1 l ii. i = 1, 2,..., n (3.1) This algorithm is called forward substitution. If the matrix is upper triangular, a similar backward substitution formula can be derived. Formula (3.1) indicates that step i of a triangular system solution requires i 1 multiplications, i 1 additions, and one division, which sum up to [2 (i 1) + 1] operations for step i. Using the formula n i = 1 n(n + 1), (3.2) 2 i=1 the total number of operations for the full solution of a triangular system is n [2 (i 1) + 1] = n 2. (3.3) i=1

3.1. DIRECT METHODS 17 3.1.2 Gaussian elimination Gaussian elimination should be familiar to the reader: it is a classical method to solve linear systems whose idea is to systematically eliminate unknowns, up to the point that the system can be readily solved using techniques of the previous section. Consider the system a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.... a n1 x 1 + a n2 x 2 + + a nn x n = b n. (3.4) In the following, we assume that the matrix A = [a ij ] is non-singular: as a consequence, the system (3.4) must have a unique solution. If a 11 0, x 1 can be removed from the (n 1) last equations by subtracting to equation i the multiple m i1 = a i1 a 11, i = 2, 3,..., n of the first equation. The (n 1) last equations thus become a (2) 22 x 2 + a (2) 23 x 3 + + a (2) 2n x n = b (2) 2. a (2) n2 x 2 + a (2) n3 x 3 + + a (2) nn x n = b (2) n. where the new coefficients are given by a (2) ij = a ij m i1 a 1j, b (2) i = b i m i1 b 1 i, j = 2, 3,..., n. This new system has (n 1) equations with (n 1) unknowns x 2, x 3,..., x n. If a (2) 22 0, the same operation can be repeated to eliminate x 2 in the (n 2) last equations. The new system with (n 2) equations and (n 2) unknowns x 3, x 4,..., x n is obtained by the multipliers The coefficients are m i2 = a(2) i2 a (2) 22, i = 3, 4,..., n. a (3) ij = a (2) ij m i2 a (2) 2j, b (3) i = b (2) i m i2 b (2) 2 i, = 3, 4,..., n.

18 CHAPTER 3. LINEAR SYSTEMS The elements a 11, a (2) 22, a (3) 33,... used to determine the multipliers in the successive steps of the elimination are called pivots. If all those elements are non-zero, Gaussian elimination can go on until step (n 1), whose result is the only equation a nn (n) x n = b (n) n. Gathering all first equations in all those steps gives the following triangular system a (1) 11 x 1 + a (1) 12 x 2 + + a (1) 1n x n = b (1) 1 a (2) 22 x 2 + + a (2) 2n x n = b (2) 2.. a nn (n) x n = b (n) n, (3.5) Gaussian elimination where, to ensure notation consistency, a (1) ij = a ij for all j = 1, 2,..., n and b (1) 1 = b 1. This upper triangular system can then be solved by backward substitution, as in the previous section. During the whole execution of the algorithm, the operations made on the lines of A are also performed on the lines of b: the vector b can be considered as a new column of A. Likewise if the system must be solved for multiple right-hand-side vectors b, the easiest method is to consider the variants of b as new columns of A. The successive operations on A will not be affected by those add-ons. 3.1.3 Algorithmic complexity of Gaussian elimination To assess the performance of this algorithm, the number of operations to perform Gaussian elimination and to get a triangular system must be estimated. Theorem 3.1 Considering the p systems Ax = b i, i = 1,..., p, if Gaussian elimination is performed simultaneously to get p simultaneous triangular systems, the required number of operations is 2 3 n3 + ( p 1 2 ) n 2 ( p + 1 ) n. 6

3.1. DIRECT METHODS 19 Proof: We first consider step i of Gaussian elimination, where i ranges from 1 to n 1. For each row to eliminate, one division is performed for the multiplier, then (n i + p) multiplications and (n i + p) additions to eliminate the coefficients below the pivot. However, only (n i) rows remain at step i. A total of (n i)(2n 2i + 2p + 1) operations must then be performed at step i. The total number of operations is then Using the formulas n 1 [(n i)(2n 2i + 2p + 1)]. (3.6) i=1 n i = i=1 n i 2 = i=1 (3.6) becomes successively: n(n + 1) 2 n(n + 1)(2n + 1), 6 n 1 [2i 2 + ( 4n 2p 1)i + (2n 2 + 2pn + n)] i=1 (n 1)n(2n 1) (n 1)n = + ( 4n 2p 1) + (n 1)(2n 2 + 2pn + n) ( 3 2 2 = 3 n3 n 2 + 1 ) 3 n + ( 2n 3 + (2 p 12 )n2 + (p + 12 ) )n = 2 3 n3 + + 2n 3 + (2p 1)n 2 + ( 2p 1)n ( p 1 ) ( n 2 + p 1 ) n. 2 6 3.1.4 Pivot selection We observe that Gaussian elimination is no more applicable if, for some value of k, the pivot element a (k) kk is zero. Consider for example the system x 1 + x 2 + x 3 = 1 x 1 + x 2 + 2x 3 = 2 x 1 + 2x 2 + 2x 3 = 1 (3.7)

20 CHAPTER 3. LINEAR SYSTEMS It is non-singular and has a unique solution x 1 = x 2 = x 3 = 1. Nevertheless, after the first elimination step, it becomes x 3 = 1 x 2 + x 3 = 0 so that a (2) 22 = 0 and the algorithm as it was written previously cannot be applied. The solution is to permute the equations 2 and 3 before the next elimination step which directly gives here the sought triangular system. Another way of proceeding would be to permute columns 2 and 3. The same permutation must be performed in the order of the unknowns. a (k) ik In the general case, if in step k, we have a (k) kk = 0, at least one element, i = k, k + 1,..., n of column k must be non-zero, otherwise the first k columns of A (k) = [a (k) ij that A is singular. Assuming a (k) rk ] would be linearly dependent, which would imply 0, rows k and r must be permuted, the elimination can resume. Any non-singular linear system of equations can be reduced to a triangular form using Gaussian elimination and potentially row permutations. To ensure some numerical stability when applying this algorithm, more permutations are often necessary: not only when an element is exactly zero, but also when it is too close to zero. For example, suppose that, in system (3.7), the coefficient a 22 is modified and becomes 1.0001 instead of 1. Gaussian elimination without permutation gives the following triangular system: x 1 + x 2 + x 3 = 1 0.0001 x 2 + x 3 = 1 9999 x 3 = 10000 Backward substitution, using floating-point arithmetic with four significant digits, provides the solution x 1 = 0, x 2 = 0, x 3 = 1.000 whereas the actual solution, rounded to four digits, is x 1 = 1.000, x 2 = x 3 = 1.0001

3.1. DIRECT METHODS 21 On the other hand, if rows 2 and 3 are permuted, the elimination yields the following triangular system: x 1 + x 2 + x 3 = 1 x 2 + x 3 = 0 0.9999 x 3 = 1 which gives, using forward substitution (with the same accuracy as previously), the solution x 1 = x 2 = x 3 = 1.000 which is correct to three digits. Roundoff will be studied in more detail in Section 3.2.3; for now, it suffices to say that, to avoid bad errors as was just shown, it is often necessary to choose the pivot element at step k using one of these two strategies: (i) Partial pivoting. Choose r as the smallest index such that a (k) rk = max a(k) ik, k i n and permute rows k and r (ii) Complete pivoting. Choose r and s as the smallest indices such that a (k) rs = max a (k) ij, k i, j n and permute rows k and r, and columns k and s. Partial pivoting is thus equivalent to selecting the pivot at step k as the largest value, in absolute value, and the closest to a (k) kk in column k. On the other hand, complete pivoting selects as pivot at step k the largest element, in absolute value, and the closest to a (k) kk in the elements yet to handle. In practice, partial pivoting is often enough, making complete pivoting rarely used, as the search work it implies is heavier. 3.1.5 LU decomposition Gaussian elimination allows us to handle many right-hand-side vectors at once, as long as all of them are known from the beginning of the process. However, in some cases, this assumption does not hold: for example, one

22 CHAPTER 3. LINEAR SYSTEMS might solve systems Ax 1 = b 1 and Ax 2 = b 2 where b 2 is a function of x 1. LU decomposition can avoid performing once more the whole elimination process. The basic principle is that, knowing a decomposition of A in a lower triangular matrix L and a upper triangular matrix U, i.e. matrices L and U such that A = LU then the system Ax = b is equivalent to LUx = b, which can be decomposed in two triangular systems: Ly = b, Ux = y Both of them can be solved by 2 n 2 operations instead of (2/3 n 3 + 1/2 n 2 7/6 n) for a new Gaussian elimination. Such an LU decomposition does not always exist. However, considering a matrix whose Gaussian elimination could take place using at every step the diagonal pivot (without row permutation), then this LU decomposition exists, and its elements can be retrieved from Gaussian elimination. Theorem 3.2 Let A be a square matrix of order n such that the Gaussian elimination can be performed without row permutation. Then this matrix has a LU decomposition whose elements L and U are given by the elements of the Gaussian elimination. Proof: When the Gaussian elimination can be performed without row permutation, the algorithm can be described as finding the sequence of matrices A = A (1), A (2),..., A (n) using n 1 transformations A (k+1) = M k A (k), k = 1, 2,..., n 1 (3.8)

3.1. DIRECT METHODS 23 where with 1 0 0 0 0 0 1 0 0 0.......... M k = 0 0 1 0 0 0 0 m k+1,k 1 0.......... 0 0 m n,k 0 1 ( ) I 0 = I m k e T k = X I m T k = (0, 0,..., 0, m k+1,k,..., m n,k ) e T k = (0, 0,..., 1, 0, 0,..., 0) k and Formula 3.8 provides 0 0 m k+1,k X =....... 0 0 m n,k hence A (n) = M n 1 M n 2... M 2 M 1 A (1) A = A (1) = M 1 1 M 1 2... M 1 n 2 M 1 n 1 A (n). The matrix A (n) being upper triangular, the last step is to prove that the product of the matrices M 1 k is lower triangular. First notice that ( ) 1 ( ) I 0 I 0 M 1 k = = = I + m X I X I k e T k Then, let the matrices L k be L k := M 1 1 M 1 2... M 1 k

24 CHAPTER 3. LINEAR SYSTEMS Those matrices are of the form L k = I + m 1 e T 1 + m 2 e T 2 +... + m k e T k. Indeed, this is true for k = 1, as L 1 = M 1 1 = I + m 1 e T 1. If this is true for k, let us prove it for k + 1: L k+1 = L k M 1 k+1 = ( ) ( I + m 1 e T 1 +... + m k e T k I + mk+1 e T k+1 = ( I + m 1 e T 1 +... + m k e T k + m ) k+1 e T k+1 ) + ( m 1 ( e T 1 m k+1 ) + m2 ( e T 2 m k+1 ) +... + mk ( e T k m k+1 )) e T k+1 = 0 = 0 = 0 Consequently, L n 1 = M1 1 M2 1... Mn 1 1 is a lower triangular matrix. Overall, this is the LU decomposition of A, where L contains the multipliers and U the elements transformed by Gaussian elimination: ( L = (m ik ), i k, U = a (k) kj ), k j (3.9) It is also possible to prove that this LU decomposition is unique. To obtain the LU decomposition for a matrix A, it is enough to perform the Gaussian elimination, and to keep the multipliers. On a computer, the algorithm will be the following: as the multiplier m ik = a (k) ik /a(k) kk is determined in such a way that a (k+1) ik is zero, the elements of the main diagonal of L do not need to be stored, as they are all equal to one. This way, no supplementary memory is required, and Gaussian elimination thus performs the following transformation: a 11 a 12 a 1n a (n) 11 a (n) 12 a (n) 1,n 1 a (n) 1n a 21 a 22 a 2n..... m 21 a (n) 22 a (n) 2,n 1 a (n) 2n.. a n1 a n2 a nn m n1 m n2 m n,n 1 a (n) nn

3.2. ERROR IN LINEAR SYSTEMS 25 3.2 Analysis of the error when solving linear systems In practice, when solving linear systems of equations, errors come from two sources. The first one is that the elements of A and b are not necessarily exactly known to full precision: this uncertainty has some impact on the solution x, which can be measured. The other one is common in numerical algorithms: computations in floating-point arithmetic suffer from rounding errors. A correct analysis of these effects is very important, even more for systems of very large size, for which millions of operations must be performed. If x is the computed solution to the system Ax = b, the residue is the vector r = b A x. Even though r = 0 implies x = A 1 b, it is wrong to think that a small r indicates the solution x is precise. This is not always the case, as shown by the following example. Example 3.1 Consider the linear system of equations 1.2969 0.8648 0.8642 A =, b = 0.2161 0.1441 0.1440 Suppose that the following solution is obtained.. (3.10) The residue for this solution x is x = (0.9911, 0.4870) T. r = ( 10 8, 10 8 ) T Since the residue is very small, we can expect that the error on x should be small. Actually, this is wrong, as no digit of x is significant! The exact solution is x = (2, 2) T In this particular case, it is easy to see that the system (3.10) is very illconditioned: eliminating x 1 in the second equation leads to where a (2) 22 x 2 = b (2) 2 a (2) 22 = 0.1441 0.2161 0.8648 = 0.1441 0.1440999923 10 8 1.2969

26 CHAPTER 3. LINEAR SYSTEMS Obviously a small perturbation on the element a 22 = 0.1441 will have a large impact on a (2) 22, and eventually on x 2. As a result, if the coefficients of A and b are not known to a higher precision than 10 8, the computed solution to (3.10) makes no sense. 3.2.1 Vector and matrix norms To analyse errors, it will be useful to associate to each vector or matrix a nonnegative scalar that measures its length. Such a scalar, when it satisfies some axioms, can be called a norm. Definition 3.1 x is a vector norm if the following axioms are satisfied. (i) x > 0 for all x 0 and x = 0 implies x = 0 (ii) x + y x + y (iii) αx = α x for all α R The most frequent vector norms belong to the family of l p norms defined as x p = ( x 1 p + x 2 p +... + x n p ) 1/p 1 p < (3.11) The most common values of p are p = 1, x 1 = x 1 + x 2 +... + x n (3.12) p = 2, x 2 = ( x 1 2 + x 2 2 +... + x n 2 ) 1/2 (3.13) p x = max 1 i n x i (3.14) The case p = 2 corresponds to the usual Euclidean norm. In general, norms of the form (3.11) (including the limit case where p ) do satisfy the axioms (i) to (iii). Definition 3.2 A is a matrix norm if the following axioms are satisfied. (i) A > 0 for all A 0 and A = 0 implies A = 0 (ii) A + B A + B

3.2. ERROR IN LINEAR SYSTEMS 27 (iii) αa = α A for all α R If the two following axioms are also satisfied (iv) Ax A x (v) AB A B then the matrix norm A is compatible with the vector norm x. Even though axiom (v) does not use a vector norm, one can show that, if it is not satisfied, then A cannot be compatible with any vector norm. Let A be a matrix norm compatible with some vector norm x. If, for some matrix A, there is a vector x 0 such that axiom (iv) is satisfied with equality, then A is subordinate to the vector norm x. One can show that any subordinate matrix norm has a unit value for the unit matrix. Any vector norm has at least one subordinate matrix norm (as a consequence, at least a compatible matrix norm) given by Ax A = max Ax = max x =1 x 0 x (3.15) which is called the matrix norm induced by the vector norm. All matrix norms used in this course will satisfy this relationship. What is more, matrix norms induced by the vector norms (3.12) to (3.14) are given by p = 1, A 1 = max 1 j n n a ij i=1 p = 2, A 2 = (maximum eigenvalue of A T A) 1/2 n p A = max a ij. 1 i n When p = 2, considering the difficulty to compute the maximum eigenvalue, the Frobenius norm is sometimes used: j=1 ( n ) 1/2 A F = a ij 2 i,j=1

28 CHAPTER 3. LINEAR SYSTEMS One can show this norm is compatible with the Euclidean vector norm, but is not subordinate to it, as I F = n. Example 3.2 Let us compute the usual norms of the vector x = ( 1 2 3 ) T. We obtain respectively, x 1 = 1 + 2 + 3 = 6, x 2 = 1 + 4 + 9 = 14 3.74, x = max{ 1, 2, 3 } = 3. Now, let us compute a few norms of the matrix 1 2 3 A = 4 5 6. 7 8 9 Respectively, A 1 = max{1 + 4 + 7, 2 + 5 + 8, 3 + 6 + 9} = 18, A 2 = max{eigenvalues of A T A} 1/2 = max{0, 1.07, 16.85} 16.85 A = max{1 + 2 + 3, 4 + 5 + 6, 7 + 8 + 9} = 24 A F = 1 + 4 + 9 + 16 + + 81 16.88. Even though det(a) = 0, there is no impact on the value of the norms: one cannot conclude that A = 0. 3.2.2 Effect of the perturbations in the data This section will define the notion of ill-conditioned linear systems, meaning that a small perturbation in their data induces a large deviation in the solution. This effect is summarized in the condition number of the matrix. The larger the condition number, the larger the sensitivity is for systems using this matrix as left-hand-side to data variations. Definition 3.3 Let A R n n be a non-singular matrix. The condition number of A is defined as κ(a) = A A 1.

3.2. ERROR IN LINEAR SYSTEMS 29 When studying the effect of a perturbation in the data, this condition number helps bounding the error. The first step is to analyse the effect of a perturbation in the second member. Proposition 3.1 Let A R n n be a non-singular matrix, let b R n be a vector, and let x R n be the solution to the linear system Ax = b. The error on x can be bounded when solving Ax = (b + δb) instead of Ax = b by δx x κ(a) δb b. Proof: The solution to the modified system is x + δx. Hence: A(x + δx) = (b + δb). As Ax = b, δx = A 1 (δb). As a consequence, δx A 1 δb. (3.16) Dividing (3.16) by x, and using b A x, which is equivalent to x b, the result is: A δx x A 1 δb A b κ(a) δb b The second part is to study the effect of a perturbation on the matrix A. Proposition 3.2 Let A R n n be a non-singular matrix, let b R n be a vector, and let x R n be the solution to the linear system Ax = b. The error on x can be bounded when solving (A + δa)x = b instead of Ax = b by δx x + δx κ(a) δa A.

30 CHAPTER 3. LINEAR SYSTEMS Proof: The solution to the modified system is x + δx. Hence: As Ax = b, (A + δa)(x + δx) = b. Aδx + δa(x + δx) = 0, meaning that δx = A 1 δa(x + δx), which implies which can be rewritten as δx A 1 δa x + δx δx x + δx κ(a) δa A Example 3.3 The matrix A of Example 3.1 has the inverse 0.1441 0.8648 A 1 = 10 8 0.2161 1.2969 As a consequence, A 1 = 1.5130 10 8. However, A = 2.1617: the condition number is then κ(a) = 2.1617 1.5130 10 8 3.3 10 8 This system is thus very ill-conditioned. As a final note, for a matrix norm induced by a vector norm, I = 1 is always true, indicating that κ(a) 1. 3.2.3 Rounding errors for Gaussian elimination As seen previously, only by rounding errors when performing Gaussian eliminations, the solution can be completely wrong. Pivoting strategies were then proposed to obtain the true solution. The analysis of rounding errors of this section proposes a justification of those. To evaluate the actual error, the technique is to find the initial matrix that would have given the result with rounding errors.

3.2. ERROR IN LINEAR SYSTEMS 31 Theorem 3.3 Let L = ( m ik ) and Ū = (ā(n) kj ) be the triangular factors computed by Gaussian elimination. Then there is an error matrix E such that L Ū is the exact decomposition of A + E, i.e. L Ū = A + E (3.17) Using a pivoting strategy (either partial or complete) and floating-point arithmetic with machine epsilon ɛ M, this matrix E is bounded by E n 2 g n ɛ M A where g n = max i,j,k ā(k) ij max i,j a ij. (3.18) Proof: At step k of Gaussian elimination, the elements of A (k) are transformed according to m ik = a(k) ik a (k) kk, a (k+1) ij = a (k) ij m ik a (k) kj (3.19) i, j = k + 1, k + 2,..., n Denoting by a bar the values m ik and ā (k+1) ij actually computed using floatingpoint arithmetic, consider those values are obtained by exact operations like (3.19) performed on the values ā (k) ij with perturbations ε (k) ij. m ik = ā(k) ik ā (k+1) ij = ā (k) ij + ε(k) ik ā (k) kk + ε (k) ij (3.20) m ik ā (k) kj. (3.21) Taking m ii = 1, summing the equations (3.21) for k = 1, 2,..., n 1 gives the following relationships a ij = p k=1 m ik ā (k) kj e ij, e ij = r k=1 ε (k) ij (3.22) p = min(i, j), r = min(i 1, j)

32 CHAPTER 3. LINEAR SYSTEMS Obviously, the equations (3.22) are equivalent to (3.17), written component by component. The remaining step is to compute a bound on E. The elements computed by floating-point arithmetic do not satisfy (3.19) but rather where m ik = ā(k) ik ā (k+1) ij = ā (k) kk ( ā (k) ij (1 + δ 1 ) (3.23) ) m ik ā (k) kj (1 + δ 2) (1 + δ 3 ) (3.24) δ i ɛ M, i = 1, 2, 3. Comparing (3.20) and (3.23), it immediately comes that Writing (3.24) as m ik ā (k) kj ε (k) ik = ā(k) ij and injecting this result in (3.21) gives ε (k) ij = ā (k+1) ij = ā(k) ik δ 1. (3.25) ā (k+1) ij /(1 + δ 3 ) 1 + δ 2 ( 1 (1 + δ3 ) 1 (1 + δ 2 ) 1) ā (k) ( ij 1 (1 + δ2 ) 1) (3.26) Neglecting the higher powers of ɛ M, (3.25) and (3.26) provide the following upper bounds: ε (k) ik ɛ M ā (k) ik, ε(k) ij 3ɛ M max( ā (k) ij, ā(k+1) ij ), j k + 1. (3.27) These results hold without any hypothesis on the multipliers m ik. To avoid numerical defects, the important point is to avoid too large ā ik, as the multipliers have no direct effect. The choice of a pivoting strategy is thus dictated by the need to avoid large growth of the transformed elements. Back to the transformation formulas (3.19), the choice of the maximum pivot makes sense. In the following part, the assumption will be that pivoting follows a partial or complete strategy. In either case, m ik 1. Eliminating ā (k) ij out of the equations (3.21) and (3.24), ε (k) ij = ā (k+1) ( ij 1 (1 + δ3 ) 1) m ik ā (k) kj δ 2

3.2. ERROR IN LINEAR SYSTEMS 33 Neglecting the higher powers of ɛ M, as m ik 1, the new upper bound is ε (k) ij 2ɛ M max( ā (k+1) ij, ā (k) kj ), j k + 1. (3.28) Definitions (3.18) and of the maximum norm (p = ) allow writing ā (k) ij max i,j,k ā(k) ij g n max a ij g n A i,j where (3.28) and (3.27) give the following bounds: Back to (3.22), ε (k) ij g n A. { ɛm si i k + 1, j = k 2ɛ M si i k + 1, j k + 1 i 1 i 1 i j r = i 1, e ij ε (k) ij g n A 2ɛ M k=1 k=1 i > j r = j, e ij k=1 = g n A 2ɛ M (i 1) ( j j 1 ) ε (k) ij g n A 2ɛ M + ɛ M k=1 which is equivalent to ( e ij ) g n ɛ M A = g n A ɛ M (2j 1) 0 0 0 0 0 1 2 2 2 2 1 3 4 4 4..... 1 3 5 2n 4 2n 4 1 3 5 2n 3 2n 2 where the inequality is satisfied component by component. The maximum norm of the matrix in the right-hand side being n (2 j 1) = n 2 j=1 this gives the announced result.

34 CHAPTER 3. LINEAR SYSTEMS If it is clear, according to (3.27), that the growth of the transformed elements a (k) ij should be avoided, it is less obvious that the systematic choice of the maximum element as pivot will avoid it. This strategy cannot be proved as the best one; there are actually cases where it is far from optimal. However, to date, no alternative strategy was proposed. A similar analysis of the errors when solving a triangular system also gives a bound on the error. Its proof is not given here. Theorem 3.4 Let Lx = b be a linear system where the matrix L = ( l ij ) is lower triangular. The vector x obtained by forward substitution is the exact solution of the perturbed triangular system where the perturbation matrix δl is bounded by (L + δl) x = b (3.29) δl n(n + 1) 2 ɛ M max l ij (3.30) i,j 3.2.4 Scale change and equation balancing In a linear system Ax = b, the unknowns x j and the right-hand sides b i often have a physical meaning. As a consequence, a change in the units for the unknowns is equivalent to changing their scale, meaning that x j = α j x j, while on the other hand a change in the units for the right-hand-sides imply a multiplication of the equation i by a factor β i. The original system thus becomes A x = b where with A = D 2 A D 1, b = D 2 b, x = D 1 x D 1 = diag (α 1, α 2,..., α n ), D 2 = diag (β 1, β 2,..., β n ) It would seem quite natural that the precision of the solution is not impacted by these transformations. To a given extend, this is true, as indicated by the following theorem, for which no proof is given.

3.2. ERROR IN LINEAR SYSTEMS 35 Theorem 3.5 Let x and x be the solutions for the two systems Ax = b and (D 2 A D 1 ) x = D 2 b. If D 1 and D 2 are diagonal matrices whose elements are integer powers of the radix of the used arithmetic, such that the scale changes do not induce numerical round-off, then Gaussian elimination in floating-point arithmetic produces the solutions to the two systems that, if the same pivots are chosen in each case, only differ by their exponents, such that x = D 1 x. The effect of a change of scale may however have an influence on the choice of pivots. Consequently, at any sequence of pivots corresponds some scale changes such that this alteration is realized: inappropriate scale changes can lead to bad pivot choices. Example 3.4 Consider the linear system ( ) ( ) 1 10000 x1 = 1 0.0001 x 2 ( 10000 1 ) which has the following solution, correctly rounded to four figures: x 1 = x 2 = 0.9999. Partial pivoting chooses a 11 as pivot, which gives the following result, using a floating-point arithmetic to three significant figures: x 2 = 1.00 x 1 = 0.00 This solution has a low quality. However, if the first equation is multiplied by 10 4, the following system is equivalent: ( ) ( ) ( ) 0.0001 1 x1 1 =. 1 0.0001 1 This time, the pivot is a 21, and with the same arithmetic the result becomes x 2 x 2 = 1.00 x 1 = 1.00 which is much better than the previous one. It is often recommended to balance the equations before applying Gaussian elimination. Equations are said to be balanced when the following conditions are satisfied: max a ij = 1, 1 j n i = 1, 2,..., n

36 CHAPTER 3. LINEAR SYSTEMS In Example 3.4, this is exactly the goal of the transformation: multiplying by 10 4 balances the equations. However, we cannot draw conclusions prematurely: a balanced system does not automatically imply avoiding all difficulties. Indeed, scale changes that are potentially performed on the unknowns can have an influence on the balancing, and consequently some scale choices may lead to problematic situations, as shown in the following example. Example 3.5 Let Ax = b be a balanced system where ε 1 1 A = 1 1 1 A 1 = 1 0 2 2 2 1 ε 1 + ε 4 1 1 1 2 1 + ε 1 ε and where ε 1. This system is well-conditioned, as κ (A) = 3 using a maximum norm. Gaussian elimination with partial pivoting thus gives a precise solution. However, choosing a 11 = ε as pivot has bad consequences on the precision of the solution. Consider the scale change x 2 = x 2 /ε and x 3 = x 3 /ε. If the new system is also balanced, A x = b, where A = 1 1 1 1 ε ε 1 ε ε In the latter case, partial pivoting, and even complete pivoting, selects a 11 = 1 as the first pivot, which is the same pivot as for matrix A. Using Theorem 3.5, this choice of pivot leads to disastrous consequences for matrix A. The explanation is rather simple: the scale changes have modified the condition number of the matrix. 0 2 2 (A ) 1 = 1 2 1 ε 1 + ε 4 ε ε 2 1 + ε ε hence A = 3, (A ) 1 = 1 + ε 2 ε, and κ (A ) = 3 (1 + ε) 2 ε 1 ε ε κ (A)

3.3. ITERATIVE METHODS 37 which means that the system A x = b is less well-conditioned than Ax = b. 3.3 Iterative methods In many applications, very large linear systems must be solved. Typical linear systems contain hundreds of millions of rows and columns. However, in the vast majority of cases, those matrices are extremely sparse, and linear algebra operations can take advantage of their specific structure. Gaussian elimination is ill-suited to such situations: its complexity becomes prohibitive for large matrices, but also it leads to fill-in, which means that, even though the initial matrix is sparse, its LU factorization is full, and does not take advantage of the structure of A. For such matrices, iterative methods are often preferred. The basic principle of those iterative methods is, as for solving non-linear equations, to generate vectors x (1),..., x (k) that get closer to the solution x of the system Ax = b. A fixed-point method will be applied here, as will be explained in the next chapter to solve non-linear systems. To this end, the system Ax = b must be rewritten to isolate x. It is possible to use x = A 1 b, but it would imply to compute A 1, which is equivalent to solving the system. Instead, the idea is to select a nonsingular matrix Q and to write the initial system as Qx = Qx Ax + b. (3.31) For a good choice of Q the iterative methods consists in solving the system (3.31) at each iteration, and thus to write x (k) = Q 1 [(Q A)x (k 1) + b]. (3.32) The art of iterative methods is to make a good choice of Q. Two questions are important. The first one is to solve efficiently the system (3.31), or in other words to compute quickly Q 1. The other important issue is to ensure convergence of the iterative method. Ideally, Q should imply a fast convergence for the whole process. Before delving into the methods, the following proposition is a convergence analysis of the method, which gives directions to choose Q.