Parallel Scientific Computing

Similar documents
COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

Solution of Linear Systems

The Solution of Linear Systems AX = B

LINEAR SYSTEMS (11) Intensive Computation

Solving Linear Systems Using Gaussian Elimination. How can we solve

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

2.1 Gaussian Elimination

Parallel Programming. Parallel algorithms Linear systems solvers

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

Linear Algebraic Equations

Solving Dense Linear Systems I

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

A Review of Matrix Analysis

Linear Algebraic Equations

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Example: Current in an Electrical Circuit. Solving Linear Systems:Direct Methods. Linear Systems of Equations. Solving Linear Systems: Direct Methods

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Computational Linear Algebra

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Computational Methods. Systems of Linear Equations

CSE 160 Lecture 13. Numerical Linear Algebra

SOLVING LINEAR SYSTEMS

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Computation of the mtx-vec product based on storage scheme on vector CPUs

Math 552 Scientific Computing II Spring SOLUTIONS: Homework Set 1

Gaussian Elimination and Back Substitution

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

The purpose of computing is insight, not numbers. Richard Wesley Hamming

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Solving Linear Systems of Equations

JACOBI S ITERATION METHOD

Next topics: Solving systems of linear equations

Dense LU factorization and its error analysis

Linear System of Equations

5. Direct Methods for Solving Systems of Linear Equations. They are all over the place...

Linear Systems of n equations for n unknowns

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

CS 323: Numerical Analysis and Computing

Chapter 9: Gaussian Elimination

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Gaussian Elimination for Linear Systems

Parallel LU Decomposition (PSC 2.3) Lecture 2.3 Parallel LU

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Process Model Formulation and Solution, 3E4

MTH 464: Computational Linear Algebra

Direct Methods for Solving Linear Systems. Matrix Factorization

5 Solving Systems of Linear Equations

Iterative Methods. Splitting Methods

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU

Ack: 1. LD Garcia, MTH 199, Sam Houston State University 2. Linear Algebra and Its Applications - Gilbert Strang

Practical Linear Algebra: A Geometry Toolbox

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

Section Gaussian Elimination

Numerical Methods Lecture 2 Simultaneous Equations

NUMERICAL MATHEMATICS & COMPUTING 7th Edition

Section 9.2: Matrices. Definition: A matrix A consists of a rectangular array of numbers, or elements, arranged in m rows and n columns.

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

1300 Linear Algebra and Vector Geometry

POLI270 - Linear Algebra

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

Introduction to PDEs and Numerical Methods Lecture 7. Solving linear systems

Numerical Analysis Fall. Gauss Elimination

Numerical Methods - Numerical Linear Algebra

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

MAC1105-College Algebra. Chapter 5-Systems of Equations & Matrices

Solving linear equations with Gaussian Elimination (I)

1.5 Gaussian Elimination With Partial Pivoting.

Matrices and systems of linear equations

COURSE Iterative methods for solving linear systems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

Solving Linear Systems

AMS 147 Computational Methods and Applications Lecture 17 Copyright by Hongyun Wang, UCSC

Chapter 4. Solving Systems of Equations. Chapter 4

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n...

Numerical Analysis: Solving Systems of Linear Equations

1111: Linear Algebra I

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Overview: Synchronous Computations

Gauss-Seidel method. Dr. Motilal Panigrahi. Dr. Motilal Panigrahi, Nirma University

Chapter 7 Iterative Techniques in Matrix Algebra

Solving PDEs with CUDA Jonathan Cohen

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

4.2 Floating-Point Numbers

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Class Notes: Solving Simultaneous Linear Equations by Gaussian Elimination. Consider a set of simultaneous linear equations:

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

CHAPTER 6. Direct Methods for Solving Linear Systems

PH1105 Lecture Notes on Linear Algebra.

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Homework 6 Solutions

Math/Phys/Engr 428, Math 529/Phys 528 Numerical Methods - Summer Homework 3 Due: Tuesday, July 3, 2018

EBG # 3 Using Gaussian Elimination (Echelon Form) Gaussian Elimination: 0s below the main diagonal

Transcription:

IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation. Jacobi, Gauss-Seidel. Sparse linear systems and differential equations.

IV- Matrix-Matrix Multiplication Problem: C = A B where A and B are n n matrices. Sequential code: for i = 1 to n do for j = 1 to n do sum = 0; for k = 1 to n do sum = sum+a[i,k] b[k,j]; c[i,j] = sum; endfor endfor endfor

IV-3 An example of A B A B 1 B B C 1 3 1... = 1 3 4 5 7 6 8 = 1 5+ 6 1 7+ 8 3 5+4 6 3 7+4 8 = 17 3 39 53

IV-4 Task graph of C = A B Partitioned code: for i = 1 to n do T i : for j = 1 to n do sum = 0; for k = 1 to n do sum = sum+a[i,k] b[k,j]; endfor c[i,j] = sum; endfor endfor T i : Read row A i and matrix B. Write row C i Task graph: T 1 T T... 3 T n

IV-5 Task and data mapping for C = A B SPMD code: for i = 1 to n if proc map(i)=me do T i Data mapping: A is partitioned using rowwise block mapping C is partitioned using rowwise block mapping B is duplicated to all processors Changes in T i s code: a ik a local(i)k c ij c local(i)j

IV-6 Parallel SPMD code of C = A B for i = 1 to n do if proc map(i)=me do endfor for j = 1 to n do sum = 0; for k = 1 to n do endfor endfor endif sum = sum+a[local(i),k] b[k,j]; c[local(i), j] = sum;

IV-7 Parallel algorithm with 1D partitioning Partitioned code: for i = 1 to n do for j = 1 to n do T i,j : sum = 0; for k = 1 to n do sum = sum+a(i,k) b(k,j); Endfor c(i,j) = sum; Endfor Endfor Data access: Each task T i,j reads row A i and column B j to write data element c i,j.

Task graph: n independent tasks: T 1,1 T 1, T 1,n T,1 T, T,n T n,1 T n, T n,n IV-8 Mapping. Matrix A is partitioned using row-wise block mapping Matrix C is partitioned using row-wise block mapping Matrix B is partitioned using column-wise block mapping Task T i,j is mapped to the processor of row i in matrix A. Cluster 1: T 1,1 T 1, T 1,n Cluster : T,1 T, T,n Cluster n: T n,1 T n, T n,n

IV-9 Parallel algorithm: For j = 1 to n Broadcast column B j to all processors Do tasks T 1,j,T,j,,T n,j in parallel. Endfor Evaluation: Each multiplication or addition counts one time unit ω. Each task T i,j costs nω. Assume that each broadcast costs (α+βn)logp. PT = n ((α+βn)logp+ n p nω) j=1 = n(α+βn)logp+ n3 ω p.

IV-10 Gaussian Elimination -Direct Method for Solving Linear System- (1) 4x 1 9x +x 3 = () x 1 4x +4x 3 = 3 (3) x 1 +x +x 3 = 1 ()-(1)* 4 0.5x +3x 3 = (4) (3)-(1)*- 1 4 1 4 x + 5 x 3 = 3 (5) (5)-(4)*- 1 4x 3 = 5 4x 1 9x +x 3 = 1 x +3x 3 = 4x 3 = 5

IV-11 Backward substitution: x 3 = 5 8 = 1 4 x = 3x 3 1 x 1 = +9x x 3 4 = 3 4

IV-1 GE on Augmented Matrices Use an augmented matrix to express elimination process for solving Ax = b. Augmented matrix: (A b). 4 9 4 4 3 1 1 ()=() (1) 4 (3)=(3) (1) 1 4 = 4 9 0 1 3 0 1 4 5 3 4 9 0 1/ 3 0 0 4 5/ Column n+1 of A stores column b!

IV-13 Gaussian Elimination Algorithm Forward Elimination For k = 1 to n 1 For i = k +1 to n a ik = a ik /a kk ; For j = k +1 to n+1 endfor endfor endfor a ij = a ij a ik a kj ; Loop k controls the elimination steps. Loop i controls i-th row accessing and loop j controls j-th column accessing.

IV-14 Backward Substitution Note that x i uses the space of a i,n+1. For i = n to 1 For j = i+1 to n x i = x i a i,j x j ; Endfor x i = x i /a i,i ; Endfor

IV-15 Algorithm Complexity Each division, multiplication, subtraction counts one time unit ω. Ignore loop overhead. #Operations in forward elimination: n 1 n k=1 i=k+1 1+ n j=k+1 +ω = n 1 n k=1 i=k+1 ((n k)+3)ω ω n 1 k=1 (n k) n3 3 ω #Operations in backward substitution: n (1+ k=1 n i=k+1 )ω ω n (n k) n ω k=1 Total #Operations: n3 3 ω. Total space: n double-precision numbers.

IV-16 Parallel Row-Oriented GE For k = 1 to n 1 For i = k +1 to n T i k : a ik = a ik /a kk For j = k +1 to n+1 a ij = a ij a ik a kj EndFor T i k : Read rows A k,a i Write row A i Dependence Graph 3 4 T T T 1 1 1 T 1 n k=1 T 3 T 4 T n k= T 3 4 T 3 n k=3 Tn n 1 k=n 1

IV-17 Parallelism and Scheduling Parallelism: Tasks T k+1 k T k+ k... T n k are independent. Parallel Algorithm(Basic idea) For k = 1 to n 1 Do T k+1 k T k+ k... Tk n on p processors. in parallel

IV-18 Task Mapping Define n clusters: C 1 C C 3 C n φ T 1 T 1 3 T n 1 T 3... T n.. n p procs T n-1 Cluster T1 C Cluster T1 3T3 C 3... Map n clusters to p processors: C k = proc_map(k) cyclic block

IV-19 Block vs. Cyclic Mapping If block mapping is used Profile the computation load of C,C 3,...,C n. Load... cluster C C C C 3 4 n Then Load(P 0 ) Load(P 1 )... Load(P n 1 ) Load is NOT balanced among processors! If cyclic mapping is used. Load is balanced.

Parallel Algorithm: Proc 0 broadcasts Row 1 For k = 1 to n 1 Do T k+1 k...tk n in parallel (Tk i proc_map(i)). Broadcast row k +1. endfor IV-0 SPMD Code: me=mynode(); For i = 1 to n if proc map(i)==me, initialize Row i; If proc map(1)==me, broadcast Row 1 else receive it; For k = 1 to n 1 For i = k +1 to n Ifproc map(i)==me, dot i k If proc map(k+1)==me, then broadcast Row k +1 else receive it.

IV-1 Column-Oriented GE Interchange loops i and j of the row-oriented GE. For k = 1 to n 1 For i = k +1 to n a ik = a ik /a kk EndFor For j = k +1 to n+1 For i = k +1 to n a ij = a ij a ik a kj EndFor EndFor EndFor

Impact on data accessing patterns IV- Example 4 9 4 4 3 1 1 ()=() (1) 4 (3)=(3) (1) 1 4 = 4 9 0 1 3 0 1 4 5 3 Data access (writing) sequence for row-oriented GE: 1 3 4 5 6 7 8 Data writing sequence for column-oriented GE: 1 3 5 7 4 6 8

IV-3 Column-oriented backward substitution. Interchange loops i and j in the row-oriented backward substitution code. For j = n to 1 x j = x j /a j,j ; For i = j 1 to 1 x i = x i a i,j x j ; Endfor EndFor For example, given: 4x 1 9x +x 3 = 0.5x +3x 3 = 4x 3 = 5.

IV-4 The row-oriented algorithm performs: x 3 = 5 8 x = 3x 3 x = x 0.5 x 1 = +9x x 1 = x 1 x 3 x 1 = x 1 4. The column-oriented algorithm performs: x 3 = 5 8 x = 3x 3 x 1 = x 3 x = x 0.5 x 1 = x 1 +9x x 1 = x 1 4.

IV-5 Parallel Column-Oriented GE Partitioned code: For k = 1 to n 1 T k k : For i = k +1 to n a ik = a ik /a kk For j = k +1 to n+1 T j k : For i = k +1 to n a ij = a ij a ik a kj

IV-6 Task graph: T 1 1 T 1 T 1 3 T 1 4... n+1 k=1 T 1 T T 3 4 n k= T... +1 T T 3 3 T 3 4... n+1 T 3 k=3 Tn+1 n 1 k=n 1 Schedule:? SPMD code:?

IV-7 Column-oriented backward substitution Partitioning: For j = n to 1 Sj x x j = x j /a j,j ; For i = j 1 to 1 x i = x i a i,j x j ; Endfor EndFor Dependence: S x n S x n 1 Sx 1.

IV-8 Parallel Algorithm: Execute all these tasks (Sj x, j = n,,1) gradually on the processor that owns x (column n+1). For j = n to 1 If owner(column x)==me then Receive column j if not available. Do Sj x. Else If owner(column j)==me, send column j to the owner of column x. EndFor

IV-9 Problems with the GE Method Problem 1: a k,k = 0. (1) 0+x +x 3 = () 3x 1 +x 3x 3 = (3) x 1 +5x x 3 = 5 0 1 1 3 3 1 5 1 x = 5 Using Gaussian elimination: Eq() (1) 3 0 Eq(3) (1) 1 0 Solution: At stage k, interchange rows such that a k,k is the maximum in the lower portion of the column k.

IV-30 Gaussian Elimination with Pivoting Row-oriented Forward Elimination For k = 1 to n 1 Find m such that a m,k = max i k { a i,k }; If a m,k = 0, No unique solution, stop; Swap row(k) with row(m); For i = k +1 to n a ik = a ik /a kk ; For j = k +1 to n endfor endfor endfor a ij = a ij a ik a kj ; b i = b i a ik b k ;

An example of GE with Pivoting IV-31 0 1 1 3 3 1 5 1 5 (1) () = 3 3 0 1 1 1 5 1 5 (3) (1) 1 3 = () (3) = 3 3 0 1 1 0 13 3 0 3 3 0 13 3 0 0 1 1 13 3 13 3 (3) () 3 13 = 3 3 0 13 3 0 0 0 1 13 3 1 x 1 = 1 x = 1 x 3 = 1

IV-3 Column-Oriented GE with Pivoting For k = 1 to n 1 Find m such that a m,k = max i k { a i,k }; If a m,k = 0, No unique solution, stop. Swap row(k) with row(m); For i = k +1 to n a ik = a ik /a kk EndFor For j = k +1 to n+1 For i = k +1 to n a ij = a ij a ik a kj EndFor EndFor EndFor

IV-33 Parallel column-oriented GE with pivoting Partitioned forward elimination: For k = 1 to n 1 P k k Find m such that a m,k = max i k { a i,k }; If a m,k = 0, No unique solution, stop. For j = k to n+1 S j k : Swap a k,j with a m,j ; Endfor T k k : For i = k +1 to n a i,k = a i,k /a k,k endfor

IV-34 For j = k +1 to n+1 T j k For i = k+1 to n a i,j = a i,j a i,k a k,j endfor endfor

IV-35 Dependence structure for iteration k P k k Find the maximum element. Broadcast swapping positions S k k S k k+1... S k n+1 Swap each column T k k T k k+1... T k n+1 Scaling column k Broadcast column k updating columns k+1,k+,...,n+1

IV-36 Combining messages and merging tasks Define task U k k as performing Pk k, Sk k, and Tk k. Define task U j k as performing Sj k, and Tj k (k +1 j n+1). U k k P k k S k k T k k Find the maximum element. Swap column k. Scaling column k Broadcast swapping positions and column k. S k k+1 T k k+1 U k k+1... n+1 S k... n+1 T k Uk n+1 Swap column k+1,k+,...,n+1 updating columns k+1,k+,...,n+1

IV-37 Parallel algorithm for pivoting For k = 1 to n 1 The owner of column k does Uk k and broadcasts the swapping positions and column k. Do U k+1 k...uk n in parallel endfor

IV-38 Iterative Methods for Solving Ax = b Ex: (1) 6x 1 x +x 3 = 11 () x 1 +7x +x 3 = 5 (3) x 1 +x 5x 3 = -1 = x 1 = 11 6 1 6 ( x +x 3 ) x = 5 7 1 7 ( x 1 +x 3 ) x 3 = 1 5 1 5 (x 1 +x ) = x (k+1) 1 = 1 6 (11 ( x(k) +x (k) 3 )) x (k+1) = 1 7 (5 ( x(k) 1 +x (k) 3 )) x (k+1) 3 = 1 5 ( 1 (x(k) 1 +x (k) ))

IV-39 Initial Approximation: x 1 = 0,x = 0,x 3 = 0 Iter 0 1 3 4 8 x 1 0 1.833.038.085.004.000 x 0 0.714 1.181 1.053 1.001 1.000 x 3 0 0. 0.85 1.080 1.038 1.000 Stop when x (k+1) x (k) < 10 4 Need to define norm x (k+1) x (k).

IV-40 Iterative methods in a matrix format x 1 x x 3 k+1 = 0 6 1 6 7 0 7 1 5 5 0 x 1 x x 3 k + 11 6 5 7 1 5 General iterative method: Assign an initial value to x (0) k=0 Do x (k+1) = H x (k) +d until x (k+1) x (k) < ε

IV-41 Norm of a Vector Given x = (x 1,x, x n ): x 1 = n i=1 x i x = xi x = max x i Example: x = ( 1,1,) x 1 = 4 x = 1+1+ = 6 x = Applications: Error ε

IV-4 Jacobi Method for Ax = b x k+1 i = 1 a ii (b i j i a ij x k j) i = 1, n Example: (1) 6x 1 x +x 3 = 11 () x 1 +7x +x 3 = 5 (3) x 1 +x 5x 3 = -1 = x 1 = 11 6 1 6 ( x +x 3 ) x = 5 7 1 7 ( x 1 +x 3 ) x 3 = 1 5 1 5 (x 1 +x )

IV-43 Jacobi method in a matrix-vector form x 1 x x 3 k+1 = 0 6 1 6 7 0 7 1 5 5 0 x 1 x x 3 k + 11 6 5 7 1 5

IV-44 Parallel Jacobi Method or in general x k+1 = D 1 Bx k +D 1 b x k+1 = Hx k +d. Parallel solution: Distribute rows of H to processors. Perform computation based on owner-computes rule. Perform all-all broadcasting after each iteration.

IV-45 If the iterative matrix is sparse If it contains a lot of zeros, the code design should take advantage of this: Not store too many known zeros. Code should explicitly skip those operations applied to zero elements. Example: y 0 = y n+1 = 0. y 0 y 1 +y = h y 1 y +y 3 = h. y n 1 y n +y n+1 = h

IV-46 This set of equations can be rewritten as: 1 1 1 1 1... 1 1 y 1 y. y n 1 y n = h h. h h The Jacobi method in a matrix format (right side): 0.5 0 1 1 0 1 1 0 1... 1 1 0 y 1 y. y n 1 y n k 0.5 h h. h h Too time and space consuming if you multiply using the entire iterative matrix!

IV-47 Correct solution: write the Jacobi method as: Repeat For i= 1 to n y new i Endfor = 0.5(y old i 1 +yold i+1 h ) Until y new y old < ε

IV-48 Gauss-Seidel Method Utilize new solutions as soon as they are available. (1) 6x 1 x +x 3 = 11 () x 1 +7x +x 3 = 5 (3) x 1 +x 5x 3 = -1 = Jacobi method. x k+1 1 = 1 6 (11 ( xk +xk 3 )) x k+1 = 1 7 (5 ( xk 1 +xk 3 )) x k+1 3 = 1 5 ( 1 (xk 1 +xk )) = Gauss-Seidel method. x k+1 1 1 = 6 (11 ( xk +xk 3 )) x k+1 = 1 7 (5 ( xk+1 1 +x k 3 )) x k+1 3 = 1 5 ( 1 (xk+1 1 +x k+1 ))

IV-49 ε = 10 4 0 1 3 4 5 x 1 0 1.833.069 1.998 1.999.000 x 0 1.38 1.00 0.995 1.000 1.000 x 3 0 1.06 1.015 0.998 1.000 1.000 It converges faster than Jacobi s method.