Introduction to Parallel Programming in OpenMP Dr. Yogish Sabharwal Department of Computer Science & Engineering Indian Institute of Technology, Delhi

Similar documents
Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

(Refer Slide Time: 1:02)

Advanced Engineering Mathematics Prof. Pratima Panigrahi Department of Mathematics Indian Institute of Technology, Kharagpur

Definition of Equality of Matrices. Example 1: Equality of Matrices. Consider the four matrices

Lecture 3: Special Matrices

Matrices, Row Reduction of Matrices

Dynamics of Ocean Structures Prof. Dr. Srinivasan Chandrasekaran Department of Ocean Engineering Indian Institute of Technology, Madras

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Computational Fluid Dynamics Prof. Sreenivas Jayanti Department of Computer Science and Engineering Indian Institute of Technology, Madras

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

30.3. LU Decomposition. Introduction. Prerequisites. Learning Outcomes

Notes on Row Reduction

Review. Example 1. Elementary matrices in action: (a) a b c. d e f = g h i. d e f = a b c. a b c. (b) d e f. d e f.

(Refer Slide Time: 1:13)

Linear Equations in Linear Algebra

Linear Equations in Linear Algebra

Dynamics of Ocean Structures Prof. Dr Srinivasan Chandrasekaran Department of Ocean Engineering Indian Institute of Technology, Madras

2x + 5 = x = x = 4

y b where U. matrix inverse A 1 ( L. 1 U 1. L 1 U 13 U 23 U 33 U 13 2 U 12 1

Last Time. x + 3y = 6 x + 2y = 1. x + 3y = 6 y = 1. 2x + 4y = 8 x 2y = 1. x + 3y = 6 2x y = 7. Lecture 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Lecture 18: The Rank of a Matrix and Consistency of Linear Systems

Gaussian Elimination and Back Substitution

Recall, we solved the system below in a previous section. Here, we learn another method. x + 4y = 14 5x + 3y = 2

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

Numerical Linear Algebra

Select/Special Topics in Atomic Physics Prof. P. C. Deshmukh Department of Physics Indian Institute of Technology, Madras

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore

Linear Algebra for Beginners Open Doors to Great Careers. Richard Han

High Voltage DC Transmission Prof. Dr. S.N. Singh Department of Electrical Engineering Indian Institute of Technology, Kanpur

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication.

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Lecture 9: Elementary Matrices

MITOCW ocw f99-lec05_300k

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

Elementary Linear Algebra

(Refer Slide Time: )

x + 2y + 3z = 8 x + 3y = 7 x + 2z = 3

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.

Chapter 4 Systems of Linear Equations; Matrices

22A-2 SUMMER 2014 LECTURE 5

Linear Algebra. Solving Linear Systems. Copyright 2005, W.R. Winfrey

Math Lecture 27 : Calculating Determinants

Chapter 4. Solving Systems of Equations. Chapter 4

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Differential Equations

base 2 4 The EXPONENT tells you how many times to write the base as a factor. Evaluate the following expressions in standard notation.

MITOCW ocw f99-lec17_300k

Think about systems of linear equations, their solutions, and how they might be represented with matrices.

Lesson 12: Solving Equations

A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

Matrices. Chapter Definitions and Notations

March 19 - Solving Linear Systems

Mon Feb Matrix algebra and matrix inverses. Announcements: Warm-up Exercise:

Error Correcting Codes Prof. Dr. P Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore

MAC Module 1 Systems of Linear Equations and Matrices I

MAC1105-College Algebra. Chapter 5-Systems of Equations & Matrices

Dynamics of Ocean Structures Prof. Dr. Srinivasan Chandrasekaran Department of Ocean Engineering Indian Institute of Technology, Madras

Matrices and Matrix Algebra.

Physics 116A Solutions to Homework Set #5 Winter Boas, problem Use an augmented matrix to solve the following equation

Dylan Zwick. Fall Ax=b. EAx=Eb. UxrrrEb

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

CSE 160 Lecture 13. Numerical Linear Algebra

Polynomials; Add/Subtract

LINEAR ALGEBRA KNOWLEDGE SURVEY

Design and Analysis of Experiments Prof. Jhareswar Maiti Department of Industrial and Systems Engineering Indian Institute of Technology, Kharagpur

Matrix decompositions

Energy Efficiency, Acoustics & Daylighting in building Prof. B. Bhattacharjee Department of Civil Engineering Indian Institute of Technology, Delhi

Cofactors and Laplace s expansion theorem

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

(Refer Slide Time 02:20)

Numerical Methods Lecture 2 Simultaneous Equations

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Advanced Chemical Reaction Engineering Prof. H. S. Shankar Department of Chemical Engineering IIT Bombay. Lecture - 03 Design Equations-1

4.3 Row operations. As we have seen in Section 4.1 we can simplify a system of equations by either:

Lecture 3i Complex Systems of Equations (pages )

Maths for Signals and Systems Linear Algebra for Engineering Applications

36 What is Linear Algebra?

Matrices and Systems of Equations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Row Reducing Matrices

Computational Techniques Prof. Dr. Niket Kaisare Department of Chemical Engineering Indian Institute of Technology, Madras

Linear Algebra, Vectors and Matrices

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

Part 1: You are given the following system of two equations: x + 2y = 16 3x 4y = 2

3 Fields, Elementary Matrices and Calculating Inverses

Quantum Field Theory Prof. Dr. Prasanta Kumar Tripathy Department of Physics Indian Institute of Technology, Madras

(Refer Slide Time: 1:42)

(Refer Slide Time: 1:22)

Lecture Wigner-Ville Distributions

Structure of Materials Prof. Anandh Subramaniam Department of Material Science and Engineering Indian Institute of Technology, Kanpur

Computational Techniques Prof. Dr. Niket Kaisare Department of Chemical Engineering Indian Institute of Technology, Madras

Chapter 3 - From Gaussian Elimination to LU Factorization

24 = 5x. 4.8 = x. December 13, 2017

Transcription:

Introduction to Parallel Programming in OpenMP Dr. Yogish Sabharwal Department of Computer Science & Engineering Indian Institute of Technology, Delhi Lecture - 33 Parallel LU Factorization So, now, how do you actually do this when you are given a large matrix (right). (Refer Slide Time: 00:07) You are given a large matrix Ax=b. (So) remember what we want is blocked algorithms because we want our algorithm to be cache efficient. So, we want to use blocking (right) because this is going to involve O(n 3 ) operations, The data size is only n 2, but the number of operations is n 3. So, I want to use a blocked algorithm in order to make efficient use of the cache. What I am going to do is, I am going to rewrite this as follows. What am I trying to do? I am trying to decompose Ax=LU ; what is A? A is a full matrix and if I just write all these matrices in the block form; this is what they will look like, and of course, these entries will be 0 and here these entries will be 0. Right? So, what I am going to do is, I am going to write this matrix in the following form, I am going to look at this block, I am going to call it A 00. Alright? I am going to call this submatrix - this is not a b x b block, right. So, if this is an n x n matrix this block is b x b; what is the size of this sub matrix it s b x (n b), right, I am going to call this A 01. Okay? And I

am going to call this sub-block as A 10. Right? So, what is the size of this : (n b) cross b. And this remaining sub matrix, I am going to call A 11 ; this is of size (n b) x (n b). Alright? So, these are not sub-matrices of equal size. It is just I am carving out a b x b block at the corner. Similarly, I am going to break this matrix into L 00 ; I do not need to have a name for this one because this is going to be 0s; this part I am going to call L 01 and this is going to be L 11, similarly this is U 00, U 01, sorry, this is L 10, L 11 this is U 00, U 01 and this part I am going to call U 11. Okay? So, let me just rewrite this for you. So, this is how I am viewing the matrices: A 00, A 01, A 10, A 11 ; L 00, L 10, L 11, this is 0; U 00, U 01, U 11 and this is 0. Okay? (So) let me just write out the equations that we get. So, what is A 00. (So) multiply the first row of L with the first column of U, right. So, what do I get? L 00 multiplied by U 00 gives me A 00. What happens if I multiply the top part of L with the right part of U? I get A 01 U 01. Now, let me multiply the lower part of L with the left part of U. So, what do I get? A 10 U 00, and finally, A 11 U 01 +L 11 U 11. Alright, so, now we are going to talk about the algorithm, the blocked algorithm to do LU factorization, okay. So, here is how I do it. (so) the first thing I do is, I do this step - compute L 00 and U 00 by factorizing A 00 equal to L 00 U 00. Okay, how do I do this? I can do this sequentially. So, what are the exact steps we saw that on the previous slide, right.

(Refer Slide Time: 04:58) What we saw over here; this entire thing was how you do the LU factorization, right; the only thing is that we are not doing it for an entire matrix, we are just doing it for a block. So, A 00 ; (I will) I pick up a b x b block and I factorize it, I do the LU factorization of that. Right? I get L 00 U 00. Now that I have L 00 and U 00, I can compute U 01, I know that A 01 U 01 and U is upper triangular. So, let us look at this in a little more detail; what is happening over here. (Refer Slide Time: 05:39)

So, A 01 is equal to. So, I have a matrix block A, this is b x b, this is A; this is equal to L, which is lower triangular, times U which is a full matrix. Right? U 01 is not upper triangular, right; U 00 is upper triangular, U 01 is full (right) because the entire U matrix is upper triangular. So, the U 01 part is full. Alright, so, this is what I have to solve; how do I solve this? A little while ago, we learned how to solve Lx=b, where L was lower triangular. So, a lower triangular matrix times a vector is equal to another vector, we just learned how to do that. So, what happens if I have another set of vectors? So, the same lower triangular matrix, but if I have another set of vectors to solve, I can just append it to this column, right. So, [ x 1 x 2 ], these are 2 different vectors and I get [b 1 b 2 ] and I can solve for them, right, because what is the product of this lower triangular matrix times the first column - that will give you the first column of A. Right? This lower triangular matrix times the second column will give you the second column of A and so on. We have already seen how to do Lx equal to b, (right) that is just back substitution. So, here you are just doing back substitution on a set of vectors together. Okay? Actually what was this operation? This operation was, we called it TRSV - Triangular Solve with a Vector. This operation is (called a) called TRSM; Triangular Solve with a Matrix on the right hand side. Okay? So, these are standard operations; you will find them in all linear algebra libraries. Okay, so, what is the point; the point is, I know how to solve this: A 01 U 01. What is the size of A 01 here? It is not b x b, instead it is of the form b x (n b); this is equal to b x b matrix; this is b x (n b)a. Okay? So, how do we do this? I know how to do TRSM; this is lower triangular, right, but this is just a loads of vectors; I can just pick this into blocks of b x b and this also into blocks of b x b; and this is nothing but this lower triangular matrix times this first block; and this second block is nothing but this lower triangular matrix times the second block and so on, right. So, I can do these block by block. Alright, so, let us come back to the algorithm. So, I computed U 01 by doing a triangular solve matrix, right, A 01 U 01. Alright? And now I can also compute L 10 (right) because A 10 U 00. U 00 is upper triangular, so, this is again the same operation - Triangular Matrix Solve. Okay? Just that instead of a lower triangular matrix, I have an upper triangular matrix. So, what have I computed so far? I computed L 00 and U 00 in the first step, I computed U 01 in the second step L 10 in the third step.

Now, what am I left to compute L 11 and U 11, right. So, let me use this last equation. (So) what do I have - I have A 11 U 01 +L 11 U 11, I can rewrite L 11 U 11 = A 11 L 10 U 01. (So) I have already computed L 10 and U 01, right. So, I can multiply them and subtract that from A 11. Okay? So, what is this operation? This is matrix multiply; I am simply multiplying two matrices; we have already seen how to parallelize that, right, how to do that in a blocked manner and how to parallelize it. So, I subtract from A 11, L 10 times U 01, what do I get? I get a new matrix. Let us call it A 11 and A 11 is nothing, but L 11 U 11 in my original notation. So, how do I get L 11, U 11. Well, I have got a matrix A 11 which is supposed to be L 11 U 11. Now how am I going to get L 11, U 11? Student: recursively Recursively do the same thing over again. At this step, update A 11 ; after updating I am calling it A 11 and finally, recursively solve A 11 =L 11 U 11 and that is going to give me the next set of blocks and so on until I get the entire L and U. Alright? So, this is the blocked algorithm to do LU factorization. Okay? And how can we parallelize it? So, let us just quickly see; I am not going to write any OpenMP programs over here, but let us just quickly see how we would parallelize this. So, the first operation, let me just do it sequentially; this is not the bottleneck of the code; where is the bottleneck of the code - this update A 11 is the most expensive part. So, let me assume that I have very large matrices (right) because there are small matrices it is not so interesting, right; you are not so interested in speeding them up using parallelism and so on, right; you are only interested when you have huge matrices and in most simulations of most scientific problems you are dealing with huge matrices. Okay? So, this is definitely a bottleneck; this is the biggest bottleneck. Why - because this is a full matrix multiply and this involves (n b) 3 operations. Right? The next level you would want to optimize steps 2 and 3; step one is just a (b by b) decomposition of a b by b. So, it is too small in the whole scheme of things. Okay? So, I do not mind writing this part sequentially. Let me write it sequentially. I explained all the steps over here, right, how to decompose A=LU using Gaussian elimination. How can I parallelize steps 2 and 3? Can steps 2 and

3 happen in parallel? Yeah, both of them are dependent on what - I need L 00 here and I need U 00 here, and both of them are computed in step 1. So, I have both of them. (So) there is no other dependency between step 2 and step 3. So, I can just do them in parallel, right and even when I do them in parallel as I said, right, we can divide each of these into smaller blocks and just work on one block at a time just distribute this to the threads. Right? So, in OpenMP, I would divide this b x (n b) block into (b x b) blocks, lots of b x b blocks and just in an array use a # pragma omp for and divide them to threads; or maybe use tasks and just divide them, distribute this work to different threads. And finally, for step 4 that is matrix multiply, we already know how to do that, right, you divide it into lots of b x b blocks and you do block matrix multiply, and then you recursively go and solve it. Okay? So, that is LU factorization and solving a system of linear equations in parallel. Alright? That is how it is actually implemented in most scientific or high performance computing codes that you will see on the Net; this is how it is implemented.