# Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Save this PDF as:

Size: px
Start display at page:

Download "Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic"

## Transcription

1 Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic

2 Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66

3 QR Factorization 3 / 66

4 QR Factorization A square matrix Q R n n is called orthogonal if its columns and rows are orthonormal vectors Equivalently, Q T Q = QQ T = I Orthogonal matrices preserve the Euclidean norm of a vector, i.e. Qv 2 2 = v T Q T Qv = v T v = v 2 2 Hence, geometrically, we picture orthogonal matrices as reflection or rotation operators Orthogonal matrices are very important in scientific computing, since norm-preservation implies cond(q) = 1 4 / 66

5 QR Factorization A matrix A R m n, m n, can be factorized into where A = QR Q R m m is orthogonal [ ] ˆR R R m n 0 ˆR R n n is upper-triangular As we indicated earlier, QR is very good for solving overdetermined linear least-squares problems, Ax b 1 1 QR can also be used to solve a square system Ax = b, but requires 2 as many operations as Gaussian elimination hence not the standard choice 5 / 66

6 QR Factorization To see why, consider the 2-norm of the least-squares residual: [ ] ˆR r(x) 2 2 = b Ax 2 2 = b Q x [ ] ) ˆR = Q (b T Q x [ ] ˆR = Q T b x (We used the fact that Q T z 2 = z 2 in the second line) 6 / 66

7 QR Factorization Then, let Q T b = [c 1, c 2 ] T where c 1 R n, c 2 R m n, so that r(x) 2 2 = c 1 ˆRx c Question: Based on this expression, how do we minimize r(x) 2? 7 / 66

8 QR Factorization Answer: We don t have any control over the second term, c 2 2 2, since it doesn t contain x Hence we minimize r(x) 2 2 by making the first term zero That is, we solve the n n triangular system ˆRx = c 1 this is what Matlab does when we use backslash for least squares Also, this tells us that min x R n r(x) 2 = c / 66

9 QR Factorization Recall that solving linear least-squares via the normal equations requires solving a system with the matrix A T A But using the normal equations directly is problematic since 2 cond(a T A) = cond(a) 2 The QR approach avoids this condition-number-squaring effect and is much more numerically stable! 2 This can be shown via the SVD 9 / 66

10 QR Factorization How do we compute the QR Factorization? There are three main methods Gram-Schmidt Orthogonalization Householder Triangularization Givens Rotations We will cover Gram-Schmidt and Householder in class 10 / 66

11 QR Factorization Suppose A R m n, m n One way to picture the QR factorization is to construct a sequence of orthonormal vectors q 1, q 2,... such that span{q 1, q 2,..., q j } = span{a (:,1), a (:,2),..., a (:,j) }, j = 1,..., n We seek coefficients r ij such that a (:,1) = r 11 q 1, a (:,2) = r 12 q 1 + r 22 q 2,. a (:,n) = r 1n q 1 + r 2n q r nn q n. This can be done via the Gram-Schmidt process, as we ll discuss shortly 11 / 66

12 QR Factorization In matrix form we have: a (:,1) a (:,2) a (:,n) = q1 q2 qn r 11 r 12 r 1n r 22 r 2n r nn This gives A = ˆQ ˆR for ˆQ R m n, ˆR R n n This is called the reduced QR factorization of A, which is slightly different from the definition we gave earlier Note that for m > n, ˆQ T ˆQ = I, but ˆQ ˆQ T I (the latter is why the full QR is sometimes nice) 12 / 66

13 Full vs Reduced QR Factorization The full QR factorization (defined earlier) A = QR is obtained by appending m n arbitrary orthonormal columns to ˆQ to make it an m m orthogonal matrix We also need to append rows of zeros [ to ˆR ] to silence the last ˆR m n columns of Q, to obtain R = 0 13 / 66

14 Full vs Reduced QR Factorization Full QR Reduced QR 14 / 66

15 Full vs Reduced QR Factorization Exercise: Show that the linear least-squares solution is given by ˆRx = ˆQ T b by plugging A = ˆQ ˆR into the Normal Equations This is equivalent to the least-squares result we showed earlier using the full QR factorization, since c 1 = ˆQ T b 15 / 66

16 Full vs Reduced QR Factorization In Matlab, qr gives the full QR factorization by default >> A = rand(5,3); >> [Q,R] = qr(a) Q = R = / 66

17 Full vs Reduced QR Factorization In Matlab, qr(a,0) gives the reduced QR factorization >> [Q,R] = qr(a,0) Q = R = / 66

18 Gram-Schmidt Orthogonalization Question: How do we use the Gram-Schmidt process to compute the q i, i = 1,..., n? Answer: In step j, find unit vector q j span{a (:,1), a (:,2),..., a (:,j) } orthogonal to span{q 1, q n,..., q j 1 } To do this, we set v j a (:,j) (q T 1 a (:,j))q 1 (q T j 1 a (:,j))q j 1, and then q j v j / v j 2 satisfies our requirements This gives the matrix ˆQ, and next we can determine the values of r ij such that A = ˆQ ˆR 18 / 66

19 Gram-Schmidt Orthogonalization We write the set of equations for the q i as q 1 = a (:,1) r 11, q 2 = a (:,2) r 12 q 1 r 22,. q n = a (:,n) n 1 i=1 r inq i r nn. Then from the definition of q j, we see that r ij = q T i a (:,j), i j j 1 r jj = a (:,j) r ij q i 2 The sign of r jj is not determined uniquely, e.g. we could choose r jj > 0 for each j i=1 19 / 66

20 Classical Gram-Schmidt Process The Gram-Schmidt algorithm we have described is provided in the pseudocode below 1: for j = 1 : n do 2: v j = a (:,j) 3: for i = 1 : j 1 do 4: r ij = q T i a (:,j) 5: v j = v j r ij q i 6: end for 7: r jj = v j 2 8: q j = v j /r jj 9: end for This is referred to the classical Gram-Schmidt (CGS) method 20 / 66

21 Gram-Schmidt Orthogonalization The only way the Gram-Schmidt process can fail is if r jj = v j 2 = 0 for some j This can only happen if a (:,j) = j 1 i=1 r ijq i for some j, i.e. if a (:,j) span{q 1, q n,..., q j 1 } = span{a (:,1), a (:,2),..., a (:,j 1) } This means that columns of A are linearly dependent Therefore, Gram-Schmidt fails = cols. of A linearly dependent 21 / 66

22 Gram-Schmidt Orthogonalization Equivalently, by contrapositive: cols. of A linearly independent = Gram-Schmidt succeeds Theorem: Every A R m n (m n) of full rank has a unique reduced QR factorization A = ˆQ ˆR with r ii > 0 The only non-uniqueness in the Gram-Schmidt process was in the sign of r ii, hence ˆQ ˆR is unique if r ii > 0 22 / 66

23 Gram-Schmidt Orthogonalization Theorem: Every A R m n (m n) has a full QR factorization. Case 1: A has full rank We compute the reduced QR factorization from above To make Q square we pad ˆQ with m n arbitrary orthonormal columns We also pad ˆR with m n rows of zeros to get R Case 2: A doesn t have full rank At some point in computing the reduced QR factorization, we encounter v j 2 = 0 At this point we pick an arbitrary q j orthogonal to span{q 1, q 2,..., q j 1 } and then proceed as in Case 1 23 / 66

24 Modified Gram-Schmidt Process The classical Gram-Schmidt process is numerically unstable! (sensitive to rounding error, orthogonality of the q j degrades) The algorithm can be reformulated to give the modified Gram-Schmidt process, which is numerically more robust Key idea: when each new q j is computed, orthogonalize each remaining columns of A against it 24 / 66

25 Modified Gram-Schmidt Process Modified Gram-Schmidt (MGS): 1: for i = 1 : n do 2: v i = a (:,i) 3: end for 4: for i = 1 : n do 5: r ii = v i 2 6: q i = v i /r ii 7: for j = i + 1 : n do 8: r ij = q T i v j 9: v j = v j r ij q i 10: end for 11: end for 25 / 66

26 Modified Gram-Schmidt Process MGS is just a rearrangement of CGS: In MGS outer loop is over rows, whereas in CGS outer loop is over columns In exact arithmetic, MGS and CGS are identical MGS is better numerically since when we compute r ij = q T i v j, components of q 1,..., q i 1 have already been removed from v j Whereas in CGS we compute r ij = q T i a (:,j), which involves some extra contamination from components of q 1,..., q i 1 in a (:,j) 3 3 The q i s are not exactly orthogonal in finite precision arithmetic 26 / 66

27 Operation Count Work in MGS is dominated by lines 8 and 9, the innermost loop: r ij = q T i v j v j = v j r ij q i First line requires m multiplications, m 1 additions; second line requires m multiplications, m subtractions Hence 4m operations per single inner iteration Hence total number of operations is asymptotic to n n n 4m 4m i 2mn 2 i=1 j=i+1 i=1 27 / 66

28 Householder Triangularization The other main method for computing QR factorizations is Householder 4 triangularization Householder algorithm is more numerically stable and more efficient than Gram-Schmidt But Gram-Schmidt allows us to build up orthogonal basis for successive spaces spanned by columns of A span{a (:,1) }, span{a (:,1), a (:,2) },... which can be important in some cases 4 Alston Householder, , American numerical analyst 28 / 66

29 Householder Triangularization Householder idea: Apply a succession of orthogonal matrices Q k R m m to A to compute upper triangular matrix R Q n Q 2 Q 1 A = R Hence we obtain the full QR factorization A = QR, where Q Q T 1 QT 2... QT n 29 / 66

30 Householder Triangularization In 1958, Householder proposed a way to choose Q k to introduce zeros below diagonal in col. k while preserving previous zeros Q Q Q A Q 1A Q 2Q 1A Q 3Q 2Q 1A This is achieved by Householder reflectors 30 / 66

31 Householder Reflectors We choose Q k = [ I 0 0 F ] where I R (k 1) (k 1) F R (m k+1) (m k+1) is a Householder reflector The I block ensures the first k 1 rows are unchanged F is an orthogonal matrix that operates on the bottom m k + 1 rows (Note that F orthogonal = Q k orthogonal) 31 / 66

32 Householder Reflectors Let x R m k+1 denote entries k,..., m of of the k th column We have two requirements for F : 1. F is orthogonal, so must have Fx 2 = x 2 2. Only the first entry of Fx should be non-zero Hence we must have x = Fx =. Question: How can we achieve this? x = x 2e 1 32 / 66

33 Householder Reflectors We can see geometrically that this can be achieved via reflection across a subspace H Here H is the subspace orthogonal to v x e 1 x, and the key point is that H bisects v 33 / 66

34 Householder Reflectors We see that this bisection property is because x and Fx both live on the hypersphere centered at the origin with radius x 2 34 / 66

35 Householder Reflectors Next, we need to determine the matrix F which maps x to x 2 e 1 F is closely related to the orthogonal projection of x onto H, since that projection takes us half way from x to x 2 e 1 Hence we first consider orthogonal projection onto H, and subsequently derive F Recall that the orthogonal projection of a onto b is given by (a b) b 2 b, or equivalently bbt b T b a Hence we can see that bbt b T b is a rank 1 projection matrix 35 / 66

36 Householder Reflectors Let P H I vv T v T v Then P H x gives x minus the orthogonal projection of x onto v We can show that P H x is in fact the orthogonal projection of x onto H, since it satisfies: P H x H (v T P H x = v T x v T vv T v T v x = v T x v T v v T v v T x = 0) The projection error x P H x is orthogonal to H (x P H x = x x + vv T v T v x = v T x v, parallel to v) v T v 36 / 66

37 Householder Reflectors But recall that F should reflect across H rather than project onto H Hence we obtain F by going twice as far in the direction of v compared to P H, i.e. F = I 2 vv T v T v Exercise: Show that F is an orthogonal matrix, i.e. that F T F = I 37 / 66

38 Householder Reflectors But in fact, we can see that there are two Householder reflectors that we can choose from 5 5 This picture is not to scale ; H should bisect x e 1 x 38 / 66

39 Householder Reflectors In practice, it s important to choose the better of the two reflectors If x and x 2 e 1 (or x and x 2 e 1 ) are close, we could obtain loss of precision due to cancellation (cf. Unit 0) when computing v To ensure x and its reflection are well separated we should choose the reflection to be sign(x 1 ) x 2 e 1 (more details on next slide) Therefore, we want v to be v = sign(x 1 ) x 2 e 1 x But note that the sign of v does not affect F, hence we scale v by 1 to get an equivalent formula without minus signs : v = sign(x 1 ) x 2 e 1 + x 39 / 66

40 Householder Reflectors Let s compare the two options for v in the potentially problematic case when x ± x 2 e 1, i.e. when x 1 x 2 : v bad sign(x 1 ) x 2 e 1 + x v good sign(x 1 ) x 2 e 1 + x v bad 2 2 = sign(x 1 ) x 2 e 1 + x 2 2 = ( sign(x 1 ) x 2 + x 1 ) 2 + x 2:(m k+1) 2 2 = ( sign(x 1 ) x 2 + sign(x 1 ) x 1 ) 2 + x 2:(m k+1) v good 2 2 = sign(x 1 ) x 2 e 1 + x 2 2 = (sign(x 1 ) x 2 + x 1 ) 2 + x 2:(m k+1) 2 2 = (sign(x 1 ) x 2 + sign(x 1 ) x 1 ) 2 + x 2:(m k+1) 2 2 (2sign(x 1 ) x 2 ) 2 40 / 66

41 Householder Reflectors Recall that v is computed from two vectors of magnitude x 2 The argument above shows that with v bad we can get v 2 x 2 = loss of precision due to cancellation is possible In contrast, with v good we always have v good 2 x 2, which rules out loss of precision due to cancellation 41 / 66

42 Householder Triangularization We can now write out the Householder algorithm: 1: for k = 1 : n do 2: x = a (k:m,k) 3: v k = sign(x 1 ) x 2 e 1 + x 4: v k = v k / v k 2 5: a (k:m,k:n) = a (k:m,k:n) 2v k (vk T a (k:m,k:n)) 6: end for This replaces A with R and stores v 1,..., v n Note that we don t divide by v T k v k in line 5 (as in the formula for F ) since we normalize v k in line 4 Householder algorithm requires 2mn n3 operations 6 6 Compared to 2mn 2 for Gram-Schmidt 42 / 66

43 Householder Triangularization Note that we don t explicitly form Q We can use the vectors v 1,..., v n to compute Q in a post-processing step Recall that Q k = [ I 0 0 F ] and Q (Q n Q 2 Q 1 ) T = Q T 1 QT 2 QT n Also, the Householder reflectors are symmetric (refer to the definition of F ), so Q = Q1 T QT 2 QT n = Q 1 Q 2 Q n 43 / 66

44 Householder Triangularization Hence, we can evaluate Qx = Q 1 Q 2 Q n x using the v k : 1: for k = n : 1 : 1 do 2: x (k:m) = x (k:m) 2v k (v T k x (k:m)) 3: end for Question: How can we use this to form the matrix Q? 44 / 66

45 Householder Triangularization Answer: Compute Q via Qe i, i = 1,..., m Similarly, compute ˆQ (reduced QR factor) via Qe i, i = 1,..., n However, often not necessary to form Q or ˆQ explicitly, e.g. to solve Ax b we only need the product Q T b Note the product Q T b = Q n Q 2 Q 1 b can be evaluated as: 1: for k = 1 : n do 2: b (k:m) = b (k:m) 2v k (v T k b (k:m)) 3: end for 45 / 66

46 Singular Value Decomposition (SVD) 46 / 66

47 Singular Value Decomposition The Singular Value Decomposition (SVD) is a very useful matrix factorization Motivation for SVD: image of the unit sphere, S, from any m n matrix is a hyperellipse A hyperellipse is obtained by stretching the unit sphere in R m by factors σ 1,..., σ m in orthogonal directions u 1,..., u m 47 / 66

48 Singular Value Decomposition For A R 2 2, we have 48 / 66

49 Singular Value Decomposition Based on this picture, we make some definitions: Singular values: σ 1, σ 2,..., σ n 0 (we typically assume σ 1 σ 2...) Left singular vectors: {u 1, u 2,..., u n }, unit vectors in directions of principal semiaxes of AS Right singular vectors: {v 1, v 2,..., v n }, preimages of the u i so that Av i = σ i u i, i = 1,..., n (The names left and right come from the formula for the SVD below) 49 / 66

50 Singular Value Decomposition The key equation above is that Av i = σ i u i, i = 1,..., n Writing this out in matrix form we get A v1 v2 vn = u 1 u 2 u n σ 1 σ 2... σ n Or more compactly: AV = Û Σ 50 / 66

51 Singular Value Decomposition Here Σ R n n is diagonal with non-negative, real entries Û R m n with orthonormal columns V R n n with orthonormal columns Therefore V is an orthogonal matrix (V T V = VV T = I), so that we have the reduced SVD for A R m n : A = Û ΣV T 51 / 66

52 Singular Value Decomposition Just as with QR, we can pad the columns of Û with m n arbitrary orthogonal vectors to obtain U R m m We then need to silence these arbitrary columns by adding rows of zeros to Σ to obtain Σ This gives the full SVD for A R m n : A = UΣV T 52 / 66

53 Full vs Reduced SVD Full SVD Reduced SVD 53 / 66

54 Singular Value Decomposition Theorem: Every matrix A R m n has a full singular value decomposition. Furthermore: The σ j are uniquely determined If A is square and the σ j are distinct, the {u j } and {v j } are uniquely determined up to sign 54 / 66

55 Singular Value Decomposition This theorem justifies the statement that the image of the unit sphere under any m n matrix is a hyperellipse Consider A = UΣV T (full SVD) applied to the unit sphere, S, in R n : 1. The orthogonal map V T preserves S 2. Σ stretches S into a hyperellipse aligned with the canonical axes e j 3. U rotates or reflects the hyperellipse without changing its shape 55 / 66

56 SVD in Matlab Matlab s svd function computes the full SVD of a matrix >> A = rand(4,2); >> [U,S,V] = svd(a) U = S = V = / 66

57 SVD in Matlab As with QR, svd(a,0) returns the reduced SVD >> [U,S,V] = svd(a,0) U = S = V = >> norm(a - U*S*V ) ans = e / 66

58 Matrix Properties via the SVD The rank of A is r, the number of nonzero singular values 7 Proof: In the full SVD A = UΣV T, U and V T have full rank, hence it follows from linear algebra that rank(a) = rank(σ) image(a) = span{u 1,..., u r } and null(a) = span{v r+1,..., v n } Proof: This follows from A = UΣV T and image(σ) = span{e 1,..., e r } R m null(σ) = span{e r+1,..., e n } R n 7 This also gives us a good way to define rank in finite precision: the number of singular values larger than some (small) tolerance 58 / 66

59 Matrix Properties via the SVD A 2 = σ 1 Proof: Recall that A 2 max v 2 =1 Av 2. Geometrically, we see that Av 2 is maximized if v = v 1 and Av = σ 1 u 1. The singular values of A are the square roots of the eigenvalues of A T A or AA T Proof: (Analogous for AA T ) A T A = (UΣV T ) T (UΣV T ) = V ΣU T UΣV T = V (Σ T Σ)V T, hence (A T A)V = V (Σ T Σ), or (A T A)v (:,j) = σ 2 j v (:,j) 59 / 66

60 Matrix Properties via the SVD The pseudoinverse, A +, can be defined more generally in terms of the SVD Define pseudoinverse of a scalar σ to be 1/σ if σ 0 and zero otherwise Define pseudoinverse of a (possibly rectangular) diagonal matrix as transpose of the matrix and taking pseudoinverse of each entry Pseudoinverse of A R m n is defined as A + = V Σ + U T A + exists for any matrix A, and it captures our definitions of pseudoinverse from Unit I 60 / 66

61 Matrix Properties via the SVD We generalize the condition number to rectangular matrices via the definition κ(a) = A A + We can use the SVD to compute the 2-norm condition number: A 2 = σ max Largest singular value of A + is 1/σ min so that A + 2 = 1/σ min Hence κ(a) = σ max /σ min 61 / 66

62 Matrix Properties via the SVD These results indicate the importance of the SVD, both theoretically and as a computational tool Algorithms for calculating the SVD are an important topic in Numerical Linear Algebra, but outside scope of this course Requires 4mn n3 operations For more details on algorithms, see Trefethen & Bau, or Golub & van Loan 62 / 66

63 Low-Rank Approximation via the SVD One of the most useful properties of the SVD is that it allows us to obtain an optimal low-rank approximation to A See Lecture: We can recast SVD as A = r j=1 σ j u j v T j Follows from writing Σ as sum of r matrices, Σ j, where Σ j diag(0,..., 0, σ j, 0,..., 0) Each u j v T u j j is a rank one matrix: each column is a scaled version of 63 / 66

64 Low-Rank Approximation via the SVD Theorem: For any 0 ν r, let A ν ν j=1 σ ju j v T, then A A ν 2 = min B R m n, rank(b) ν A B 2 = σ ν+1 j That is: A ν gives us the closest rank ν matrix to A, measured in the 2-norm The error in A ν is given by the first omitted singular value 64 / 66

65 Low-Rank Approximation via the SVD A similar result holds in the Frobenius norm: A A ν F = min A B B R m n F = σν , rank(b) ν σ2 r 65 / 66

66 Low-Rank Approximation via the SVD These theorems indicate that the SVD is an effective way to compress data encapsulated by a matrix! If singular values of A decay rapidly, can approximate A with few rank one matrices (only need to store σ j, u j, v j for j = 1,..., ν) Matlab example: Image compression via the SVD 66 / 66

### AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

### Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

### Orthogonal Transformations

Orthogonal Transformations Tom Lyche University of Oslo Norway Orthogonal Transformations p. 1/3 Applications of Qx with Q T Q = I 1. solving least squares problems (today) 2. solving linear equations

### Linear Algebra Massoud Malek

CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

### B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

### Singular Value Decomposition

Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

### Lecture 3: QR-Factorization

Lecture 3: QR-Factorization This lecture introduces the Gram Schmidt orthonormalization process and the associated QR-factorization of matrices It also outlines some applications of this factorization

### Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

### Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases

Week Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases. Opening Remarks.. Low Rank Approximation 38 Week. Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases 38.. Outline..

### The Gram Schmidt Process

The Gram Schmidt Process Now we will present a procedure, based on orthogonal projection, that converts any linearly independent set of vectors into an orthogonal set. Let us begin with the simple case

### The Gram Schmidt Process

u 2 u The Gram Schmidt Process Now we will present a procedure, based on orthogonal projection, that converts any linearly independent set of vectors into an orthogonal set. Let us begin with the simple

### 2. Review of Linear Algebra

2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

### (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

### CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

### Linear Algebra Primer

Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

### 3 QR factorization revisited

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 2, 2000 30 3 QR factorization revisited Now we can explain why A = QR factorization is much better when using it to solve Ax = b than the A = LU factorization

### Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

### Singular Value Decomposition

Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

### Stability of the Gram-Schmidt process

Stability of the Gram-Schmidt process Orthogonal projection We learned in multivariable calculus (or physics or elementary linear algebra) that if q is a unit vector and v is any vector then the orthogonal

### Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

### 5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

### 18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

86 Problem Set - Solutions Due Thursday, 29 November 27 at 4 pm in 2-6 Problem : (5=5+5+5) Take any matrix A of the form A = B H CB, where B has full column rank and C is Hermitian and positive-definite

### Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

### Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

### Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

### Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

### 18.06SC Final Exam Solutions

18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon

### The QR Decomposition

The QR Decomposition We have seen one major decomposition of a matrix which is A = LU (and its variants) or more generally PA = LU for a permutation matrix P. This was valid for a square matrix and aided

### THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

### IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET This is a (not quite comprehensive) list of definitions and theorems given in Math 1553. Pay particular attention to the ones in red. Study Tip For each

### Pseudoinverse & Moore-Penrose Conditions

ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

### DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

### Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460 EigenValue decomposition Singular Value Decomposition Ramani Duraiswami, Dept. of Computer Science Hermitian Matrices A square matrix for which A = A H is said

### 1 Linearity and Linear Systems

Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

### Dot Products, Transposes, and Orthogonal Projections

Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +

### Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

### Lecture # 5 The Linear Least Squares Problem. r LS = b Xy LS. Our approach, compute the Q R decomposition, that is, n R X = Q, m n 0

Lecture # 5 The Linear Least Squares Problem Let X R m n,m n be such that rank(x = n That is, The problem is to find y LS such that We also want Xy =, iff y = b Xy LS 2 = min y R n b Xy 2 2 (1 r LS = b

### Computational math: Assignment 1

Computational math: Assignment 1 Thanks Ting Gao for her Latex file 11 Let B be a 4 4 matrix to which we apply the following operations: 1double column 1, halve row 3, 3add row 3 to row 1, 4interchange

### Singular Value Decomposition

Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

### Lecture 3: Review of Linear Algebra

ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

### Lecture 3: Review of Linear Algebra

ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

### IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET This is a (not quite comprehensive) list of definitions and theorems given in Math 1553. Pay particular attention to the ones in red. Study Tip For each

### Inner product spaces. Layers of structure:

Inner product spaces Layers of structure: vector space normed linear space inner product space The abstract definition of an inner product, which we will see very shortly, is simple (and by itself is pretty

### Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

Math 8, Linear Algebra, Lecture C, Spring 7 Review and Practice Problems for Final Exam. The augmentedmatrix of a linear system has been transformed by row operations into 5 4 8. Determine if the system

### Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Section 3.2 Theorem 3.6. Let A be an m n matrix of rank r. Then r m, r n, and, by means of a finite number of elementary row and column operations, A can be transformed into the matrix ( ) Ir O D = 1 O

### MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m

### 3D Computer Vision - WT 2004

3D Computer Vision - WT 2004 Singular Value Decomposition Darko Zikic CAMP - Chair for Computer Aided Medical Procedures November 4, 2004 1 2 3 4 5 Properties For any given matrix A R m n there exists

### Review of some mathematical tools

MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

### Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

### Homework 1 Elena Davidson (B) (C) (D) (E) (F) (G) (H) (I)

CS 106 Spring 2004 Homework 1 Elena Davidson 8 April 2004 Problem 1.1 Let B be a 4 4 matrix to which we apply the following operations: 1. double column 1, 2. halve row 3, 3. add row 3 to row 1, 4. interchange

### Dimension and Structure

96 Chapter 7 Dimension and Structure 7.1 Basis and Dimensions Bases for Subspaces Definition 7.1.1. A set of vectors in a subspace V of R n is said to be a basis for V if it is linearly independent and

### The Fundamental Theorem of Linear Algebra

The Fundamental Theorem of Linear Algebra Nicholas Hoell Contents 1 Prelude: Orthogonal Complements 1 2 The Fundamental Theorem of Linear Algebra 2 2.1 The Diagram........................................

### Computational Linear Algebra

Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

### Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

Preliminary Linear Algebra 1 Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 100 Notation for all there exists such that therefore because end of proof (QED) Copyright c 2012

### HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) PROFESSOR STEVEN MILLER: BROWN UNIVERSITY: SPRING 2007 1. CHAPTER 1: MATRICES AND GAUSSIAN ELIMINATION Page 9, # 3: Describe

### BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

BASIC ALGORITHMS IN LINEAR ALGEBRA STEVEN DALE CUTKOSKY Matrices and Applications of Gaussian Elimination Systems of Equations Suppose that A is an n n matrix with coefficents in a field F, and x = (x,,

### 10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

### 1. General Vector Spaces

1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

### AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 13: Conditioning of Least Squares Problems; Stability of Householder Triangularization Xiangmin Jiao Stony Brook University Xiangmin Jiao

### Recall the convention that, for us, all vectors are column vectors.

Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

### UNIVERSITY of LIMERICK

UNIVERSITY of LIMERICK OLLSCOIL LUIMNIGH Faculty of Science & Engineering Department of Mathematics & Statistics END OF SEMESTER ASSESSMENT PAPER MODULE CODE: MS4105 SEMESTER: Autumn 2015 MODULE TITLE:

### 5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the

### More on SVD and Gram-Schmidt Orthogonalization

Indian Statistical Institute, Kolkata Numerical Analysis (BStat I) Instructor: Sourav Sen Gupta Scribe: Saptarshi Chakraborty(BS1504) Rohan Hore(BS1519) Date of Lecture: 05 February 2016 LECTURE 6 More

### Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology 1 1 c Chapter

### Lecture 6 Positive Definite Matrices

Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices

### Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

### Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k

I. REVIEW OF LINEAR ALGEBRA A. Equivalence Definition A1. If A and B are two m x n matrices, then A is equivalent to B if we can obtain B from A by a finite sequence of elementary row or elementary column

### Numerical Linear Algebra

Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one

### Algorithms to Compute Bases and the Rank of a Matrix

Algorithms to Compute Bases and the Rank of a Matrix Subspaces associated to a matrix Suppose that A is an m n matrix The row space of A is the subspace of R n spanned by the rows of A The column space

### 1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

Notes for 2017-01-30 Most of mathematics is best learned by doing. Linear algebra is no exception. You have had a previous class in which you learned the basics of linear algebra, and you will have plenty

### Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today

AM 205: lecture 22 Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today Posted online at 5 PM on Thursday 13th Deadline at 5 PM on Friday 14th Covers material up to and including

### There are two things that are particularly nice about the first basis

Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

### Linear Algebra. Carleton DeTar February 27, 2017

Linear Algebra Carleton DeTar detar@physics.utah.edu February 27, 2017 This document provides some background for various course topics in linear algebra: solving linear systems, determinants, and finding

### 5. Orthogonal matrices

L Vandenberghe EE133A (Spring 2017) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

### I. Multiple Choice Questions (Answer any eight)

Name of the student : Roll No : CS65: Linear Algebra and Random Processes Exam - Course Instructor : Prashanth L.A. Date : Sep-24, 27 Duration : 5 minutes INSTRUCTIONS: The test will be evaluated ONLY

### Linear Algebra Using MATLAB

Linear Algebra Using MATLAB MATH 5331 1 May 12, 2010 1 Selected material from the text Linear Algebra and Differential Equations Using MATLAB by Martin Golubitsky and Michael Dellnitz Contents 1 Preliminaries

### (VII.E) The Singular Value Decomposition (SVD)

(VII.E) The Singular Value Decomposition (SVD) In this section we describe a generalization of the Spectral Theorem to non-normal operators, and even to transformations between different vector spaces.

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### 13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

The QR algorithm The most common method for solving small (dense) eigenvalue problems. The basic algorithm: QR without shifts 1. Until Convergence Do: 2. Compute the QR factorization A = QR 3. Set A :=

### Review of Linear Algebra

Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources

### Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

Orthonormal Bases, Orthogonal Matrices The Major Ideas from Last Lecture Vector Span Subspace Basis Vectors Coordinates in different bases Matrix Factorization (Basics) The Major Ideas from Last Lecture

### Linear algebra for MATH2601 Numerical methods

Linear algebra for MATH2601 Numerical methods László Erdős August 12, 2000 Contents 1 Introduction 3 1.1 Typesoferrors... 4 1.1.1 Rounding errors... 5 1.1.2 Truncationerrors... 6 1.1.3 Conditioningerrors...

### Math Linear Algebra II. 1. Inner Products and Norms

Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,

11 Adjoint and Self-adjoint Matrices In this chapter, V denotes a finite dimensional inner product space (unless stated otherwise). 11.1 Theorem (Riesz representation) Let f V, i.e. f is a linear functional

### Image Registration Lecture 2: Vectors and Matrices

Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this

### 8. the singular value decomposition

8. the singular value decomposition cmda 3606; mark embree version of 19 February 2017 The singular value decomposition (SVD) is among the most important and widely applicable matrix factorizations. It

### Extra Problems for Math 2050 Linear Algebra I

Extra Problems for Math 5 Linear Algebra I Find the vector AB and illustrate with a picture if A = (,) and B = (,4) Find B, given A = (,4) and [ AB = A = (,4) and [ AB = 8 If possible, express x = 7 as

### 8 The SVD Applied to Signal and Image Deblurring

8 The SVD Applied to Signal and Image Deblurring We will discuss the restoration of one-dimensional signals and two-dimensional gray-scale images that have been contaminated by blur and noise. After an

### Rounding error analysis of the classical Gram-Schmidt orthogonalization process

Cerfacs Technical report TR-PA-04-77 submitted to Numerische Mathematik manuscript No. 5271 Rounding error analysis of the classical Gram-Schmidt orthogonalization process Luc Giraud 1, Julien Langou 2,

### 1. The Polar Decomposition

A PERSONAL INTERVIEW WITH THE SINGULAR VALUE DECOMPOSITION MATAN GAVISH Part. Theory. The Polar Decomposition In what follows, F denotes either R or C. The vector space F n is an inner product space with

### Mathematical Methods wk 2: Linear Operators

John Magorrian, magog@thphysoxacuk These are work-in-progress notes for the second-year course on mathematical methods The most up-to-date version is available from http://www-thphysphysicsoxacuk/people/johnmagorrian/mm

### Maths for Signals and Systems Linear Algebra for Engineering Applications

Maths for Signals and Systems Linear Algebra for Engineering Applications Lectures 1-2, Tuesday 11 th October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE LONDON

### Numerical Methods Lecture 2 Simultaneous Equations

Numerical Methods Lecture 2 Simultaneous Equations Topics: matrix operations solving systems of equations pages 58-62 are a repeat of matrix notes. New material begins on page 63. Matrix operations: Mathcad

### Linear Algebra Final Exam Study Guide Solutions Fall 2012

. Let A = Given that v = 7 7 67 5 75 78 Linear Algebra Final Exam Study Guide Solutions Fall 5 explain why it is not possible to diagonalize A. is an eigenvector for A and λ = is an eigenvalue for A diagonalize

### Lecture 23: 6.1 Inner Products

Lecture 23: 6.1 Inner Products Wei-Ta Chu 2008/12/17 Definition An inner product on a real vector space V is a function that associates a real number u, vwith each pair of vectors u and v in V in such

### Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

### 6 The SVD Applied to Signal and Image Deblurring

6 The SVD Applied to Signal and Image Deblurring We will discuss the restoration of one-dimensional signals and two-dimensional gray-scale images that have been contaminated by blur and noise. After an

### ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21 Symmetric matrices An n n