Level-2 BLAS. Matrix-Vector operations with O(n 2 ) operations (sequentially) BLAS-Notation: S --- single precision G E general matrix M V --- vector

Similar documents
Lecture 3-4 Solutions of System of Linear Equations

2. Elementary Linear Algebra Problems

CS537. Numerical Analysis

ITERATIVE METHODS FOR SOLVING SYSTEMS OF LINEAR ALGEBRAIC EQUATIONS

Chapter Unary Matrix Operations

Advanced Algorithmic Problem Solving Le 3 Arithmetic. Fredrik Heintz Dept of Computer and Information Science Linköping University

PubH 7405: REGRESSION ANALYSIS REGRESSION IN MATRIX TERMS

Area and the Definite Integral. Area under Curve. The Partition. y f (x) We want to find the area under f (x) on [ a, b ]

Chapter Gauss-Seidel Method

ME 501A Seminar in Engineering Analysis Page 1

The z-transform. LTI System description. Prof. Siripong Potisuk

Rendering Equation. Linear equation Spatial homogeneous Both ray tracing and radiosity can be considered special case of this general eq.

DATA FITTING. Intensive Computation 2013/2014. Annalisa Massini

Computer Programming

Review of Linear Algebra

12 Iterative Methods. Linear Systems: Gauss-Seidel Nonlinear Systems Case Study: Chemical Reactions

MATRIX AND VECTOR NORMS

Matrix. Definition 1... a1 ... (i) where a. are real numbers. for i 1, 2,, m and j = 1, 2,, n (iii) A is called a square matrix if m n.

Sequences and summations

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Chapter 2 Intro to Math Techniques for Quantum Mechanics

A Technique for Constructing Odd-order Magic Squares Using Basic Latin Squares

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: Figure 8: Figure 9: Figure 10: Figure 11:

The linear system. The problem: solve

Stats & Summary

i+1 by A and imposes Ax

CURVE FITTING LEAST SQUARES METHOD

SOLVING SYSTEMS OF EQUATIONS, DIRECT METHODS

MATH2999 Directed Studies in Mathematics Matrix Theory and Its Applications

POWERS OF COMPLEX PERSYMMETRIC ANTI-TRIDIAGONAL MATRICES WITH CONSTANT ANTI-DIAGONALS

1 4 6 is symmetric 3 SPECIAL MATRICES 3.1 SYMMETRIC MATRICES. Defn: A matrix A is symmetric if and only if A = A, i.e., a ij =a ji i, j. Example 3.1.

On Several Inequalities Deduced Using a Power Series Approach

Fibonacci and Lucas Numbers as Tridiagonal Matrix Determinants

CHAPTER 5 Vectors and Vector Space

6.6 Moments and Centers of Mass

CS321. Introduction to Numerical Methods

this is the indefinite integral Since integration is the reverse of differentiation we can check the previous by [ ]

Roberto s Notes on Integral Calculus Chapter 4: Definite integrals and the FTC Section 2. Riemann sums

SUM PROPERTIES FOR THE K-LUCAS NUMBERS WITH ARITHMETIC INDEXES

Mathematical models for computer systems behaviour

Chapter 2 Solving Linear Equation

INTRODUCTION ( ) 1. Errors

Linear Algebra Concepts

Linear Algebra Concepts

ICS141: Discrete Mathematics for Computer Science I

Systems of second order ordinary differential equations

Answer: First, I ll show how to find the terms analytically then I ll show how to use the TI to find them.

A Brief Introduction to Olympiad Inequalities

In Calculus I you learned an approximation method using a Riemann sum. Recall that the Riemann sum is

3/20/2013. Splines There are cases where polynomial interpolation is bad overshoot oscillations. Examplef x. Interpolation at -4,-3,-2,-1,0,1,2,3,4

Chapter 7. Bounds for weighted sums of Random Variables

under the curve in the first quadrant.

Numerical Analysis Topic 4: Least Squares Curve Fitting

10.2 Series. , we get. which is called an infinite series ( or just a series) and is denoted, for short, by the symbol. i i n

Available online through

GRAPHING LINEAR EQUATIONS. Linear Equations. x l ( 3,1 ) _x-axis. Origin ( 0, 0 ) Slope = change in y change in x. Equation for l 1.

6. Chemical Potential and the Grand Partition Function

Section 7.2 Two-way ANOVA with random effect(s)

COMPLEX NUMBERS AND DE MOIVRE S THEOREM

MTH 146 Class 7 Notes

Regression. By Jugal Kalita Based on Chapter 17 of Chapra and Canale, Numerical Methods for Engineers

Objective of curve fitting is to represent a set of discrete data by a function (curve). Consider a set of discrete data as given in table.

Chapter 2 Intro to Math Techniques for Quantum Mechanics

Linear Open Loop Systems

Solutions Manual for Polymer Science and Technology Third Edition

Physics 220: Worksheet5 Name

CS321. Numerical Analysis

Methods to Invert a Matrix

Analytical Approach for the Solution of Thermodynamic Identities with Relativistic General Equation of State in a Mixture of Gases

Optimality of Strategies for Collapsing Expanded Random Variables In a Simple Random Sample Ed Stanek

Fast Fourier Transform 1) Legendre s Interpolation 2) Vandermonde Matrix 3) Roots of Unity 4) Polynomial Evaluation

A Series Illustrating Innovative Forms of the Organization & Exposition of Mathematics by Walter Gottschalk

0 otherwise. sin( nx)sin( kx) 0 otherwise. cos( nx) sin( kx) dx 0 for all integers n, k.

Mathematically, integration is just finding the area under a curve from one point to another. It is b

The Z-Transform in DSP Lecture Andreas Spanias

Strategies for the AP Calculus Exam

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

The limit comparison test

3. REVIEW OF PROPERTIES OF EIGENVALUES AND EIGENVECTORS

CS473-Algorithms I. Lecture 3. Solving Recurrences. Cevdet Aykanat - Bilkent University Computer Engineering Department

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

f(t)dt 2δ f(x) f(t)dt 0 and b f(t)dt = 0 gives F (b) = 0. Since F is increasing, this means that

Econometric Methods. Review of Estimation

Preliminary Examinations: Upper V Mathematics Paper 1

The definite Riemann integral

Add Maths Formulae List: Form 4 (Update 18/9/08)

Notes 17 Sturm-Liouville Theory

Chapter 3 Supplemental Text Material

1 Onto functions and bijections Applications to Counting

14.2 Line Integrals. determines a partition P of the curve by points Pi ( xi, y

Autar Kaw Benjamin Rigsby. Transforming Numerical Methods Education for STEM Undergraduates

Acoustooptic Cell Array (AOCA) System for DWDM Application in Optical Communication

Computational Geometry

Chapter 4: Distributions

Asynchronous Analogs of Iterative Methods and Their Efficiency for Parallel Computers

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Linear Regression

Lecture 3 Probability review (cont d)

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Density estimation II

Transcription:

evel-2 BS trx-vector opertos wth 2 opertos sequetlly BS-Notto: S --- sgle precso G E geerl mtrx V --- vector defes SGEV, mtrx-vector product: r y r α x β r y ther evel-2 BS: Solvg trgulr system x wth trgulr mtrx

evel-3 BS trx-trx opertos wth 3 opertos sequetlly BS-Notto: S --- sgle precso G E geerl mtrx --- mtrx defes SGE, mtrx-mtrx product: C α B β C PCK suroutes for solvg ler equtos, lest squres prolems, QR-decomposto, egevlues, sgulr vlues sed o BS 2

3 2.2 lyss of trx-vector- Product m m m IR c IR IR,,,,...,,..., 2.2. Vectorzto m m m m m m m m m m c c DT-products of legth m m SXPY s of legth GXPY

Pseudocode: -form: c ; for,, for,,m c c ed ed DT product c, Dot product of -th row of wth vector c 4

Pseudocode: -form: c ; for,,m for,, c c ed ed c c SXPY SXPY updtg vector c wth -th colum of GXPY GXPY: Sequece of SXPY s relted to the sme vector dvtge: vector c, tht s updted, c e ept fst memory! 5 No ddtol dt trsfer.

GXPY SXPY: x : x αy GXPY: x for ed x : x : x α y Seres of SXPYs regrdg the sme vector x. dvtge: ess dt trsfer! 6

J s <, > <, m > 2.2.2 Prllelzto y uldg locs Reduce mtrx-vector product o smller mtrx-vector products. {,2,..., } I I I {,2,..., m} J J 2 J S 2 R I dsuct: J I Use 2-dmesol rry of processors P rs. P rs gets mtrx loc rs :I r,j s, s :J s, c r :ci r. J for for I r rs s J s c r I r c r S S rs s s s c s r 7

Pseudocode for r,,r for s,,s c r s rs s ; ed ed Smll, depedet mtrx-vector products. No commucto ecessry durg computtos! for r,,r c r ; for s,,s c r c r c r s ; ed ed Blocwse collecto d ddto of vectors. Rowwse commucto! F-. 8

9 Specl cse: S c 2 2 P P 2.. No commucto ecessry etwee processor P,,P R The computto of s vectorzle y GXPY s. R : 2 2 2 2 c re depedet. The collecto of prtl results from processor P,,P r. F-. Fl sum oe processor: vectorzle y GXPY s.

Rules Ier loops of progrm should e smple, vectorzle 2 uter loop of progrmm should e susttl, depedet, prllelzle for susttl d prllelzle for. ed smple, vectorzle ed 3 Reuse of dt Cche, mml dt trsfer, locg 2

2 2.2.3 c for Bded trx β β β β,,,, 22, β β Bdwdth β symmetrc 2β dgols: m dg. β sudg. β superdg. β: trdgol,,,,, 2, β β β β β β

22,,,,, 2, β β β β β β,,,,,, N N N β β β β β β Storg etres dgolwse: 2β mtrx sted of 2. row for s s,...,,, d s d s β β row dgol s β β s d s d s s [ ] { } { } [ ] r l s,,m, mx, β β [ ] { } { } [ ] s s r l s s,,m mx,,

Computto of the mtrx-vector product sed o ths storge scheme o vector processors: For,,: c r, s s s l r s l, s s For s -β : β For mx{-s,} : m{-s,} c c s s ed ed s Geerl trde No SXPY or For : For s mx{-β,-} : mx{β,-} c c s Prtl Dot product s ed ed 23 Sprsty less opertos, ut lso loss of effcecy

Bd Prllel Prttog: for ed R <, > U I c r r s l r s I r, s ; dsuct Processor P r gets rows to dex set I r :[m r, r ] order to compute ts prt of the fl vector c. Wht prt of vector does processor P r eed order to compute ts prt of c? 24

Necessry for I r : s : s m r l m r m r mx { β, m } mx{ m β,} r r s r r r r m { β, } m{ β, } r r Processor P r wth dex set I r eeds from the dces [ mx{, m β },m{ β} ], r r 25

26 2.3 lyss of trx-trx Product q q m m c B C B,...,,...,,...,,...,,...,,...,,, ed ed c q For For m : : * * * * * * * * * * * * * * * * * * * * m m c

2.3. Vectorzto -Form: lgorthm For : For : q For : m c c ; ed ed ed Dot product of legth m c B for ll, ll etres c re fully computed, oe fter other. ccess to d C s rowwse, to B columwse. 27

28 ther vew o the mtrx-mtrx product: m T m m T e e e e 2 trx cosdered s comto of colums or rows T T B B e e B e e B, s sum of full mtrces B y outer product of the -th colum of d the -th row of B Full x q - mtrces

-Form, lgorthm 2 For : q For : m For : c c ; ed ed ed c c Vector updte c. SXPY GXPY c Sequece of SXPY s for the sme vector c. C computed columwse; ccess to columwse. ccess to B columwse, ut delyed. 29

-Form, lgorthm 3 For : m For : q For : c c ; ed ed ed c c SXPY Vector updte c. No GXPY Sequece of SXPY s for dfferet vectors c. ccess to columwse. ccess to B delyed. C computed wth termedte vlues c whch re computed columwse. 3

vervew over dfferet Forms ccess to y ccess to B y Comput -to of C row ----- ------ row colum colum colum row row colum ------- ------- row row row colum colum colum Comput drect delyed delyed drect delyed delyed -to of c Vector operto Vector leght DT GXPY SXPY DT GXPY SXPY m q q m m m Better: GXPY, loger vector legth. ccess to mtrces ccordg to storge scheme rowwse or columwse 3

2.3.2 trx-trx Prllel R <, > U I, <, m > U K, <, q > U J r r S s s T t t. Dstrute the locs reltve to dex sets I r, K s, d J t to processor rry P rst : K s J t J t K s. I r rs c s rt I r B st Processor P rst computes smll mtrx-mtrx product. ll Processors prllel. 2 Compute sum y f- s: c rt c rt c s rt S s rs s B 32 st

Specl Cse S J t J t. I r r c rt I r B t I ths cse ech processor P rt c compute ts prt of c, c rt, depedetly wthout y commucto. Ech processor eeds full loc of rows of, reltve to dex set I r, d full loc of colums of B, reltve to dex set J t, order to compute c rt reltve to rows I d colums J t. 33

Specl Cse S J t J t. I r r c rt I r B t Especlly wth *q processors ech processor hs to compute oe DT product wth m prllel tme steps. c rt m r t F- y m q ddtol processors for ll these Dot products reduces the umer of prllel tme steps to logm. 34

Grulrty for BS BS: perto Formul memory Grulrty BS- XPY: 2 αxy 2 < BS-2 GEV: 2 2 αxβy 2 3 2 BS-3 GE: 2 3 αbβc 4 2 /2 BS-3 hve the est opertos to memory rto! 35

D-Prllelzto *B D: p processors ler, ech processor gets full d colum slce of B, computg the relted colum slce of CB, B Commucto: N 2 p for d N*N/p*pN 2 for B Grulrty: N 3 /N 2 pn/p, B 2., B /p For : For : For : C, C,, B, Blocg oly, the colums of B! 36

2D-Prllelzto *B 2D: p processors squre, q:sqrtp ech proc. gets row slce of d colum slce of B computg full suloc of CB 2. Compre: S efore! B B 2. B N/q N/q Commucto: N 2 p /2 for d N 2 p /2 for B Grulrty: N 3 /2N 2 p.5 N/2p.5 For : For : For : C, C,, B, Blocg d, the colums of B d the rows of! 37

3D-Prllelzto *B 3D: p processors cuc, ech processor gets suloc of d suloc of B, computg prt of suloc of CB, ddtol f- to collect prts to full suloc of C. q p /3. Commucto: N 2 p /3 for d for B p*n 2 /p 2/3 p * locsze f-: N 2 p /3 the sme Grulrty: N 3 /3N 2 p /3 N/3p /3 For : For : For : C, C,, B, 38

Product of red d lue gves prt of grey, tht hve to e dded up to gve the full grey loc. 3D locg red, lue, reltve to lc. 39

3. Guss Elmto: Bsc Propertes 3. er Equtos wth dese mtrces x x x x System of ler equtos: x x x Solve Geerte smpler ler equtos mtrces. Strt wth Trsform trgulr form: 2. U

2 2 2 2 2 2 2 22 2 2 2 2 3 3 3 2 : 2 22 2 2 2 22 2 32 2 2 3 2 2 2 3 2 33 2 32 2 2 2 23 2 22 3 2 2

3 3 3 4 4 3 2 : 3 33 3 3 3 33 3 43 3 3 3 3 3 3 33 2 2 2 23 2 22 3 2 3 U 2 2 2 22 2

We ssume tht o pvotg s ecessry smplfy or ρ > for,2,..., lgorthm: For : - For : l, / ; ed For : For : l ; ed ed ed I prctce: Iclude pvotg d clude rght hd sde. There s stll to solve trgulr system U! 4

5 Itermedte systems,,2,, wth d U.,,, prt of tht wll e used d chged the followg computtos.

6 Defe uxlry mtrces:,, l l -th colum, 2 l l l U d

7 Ech elmto step c e wrtte terms of the uxlry mtrces: I I I I U K : I I U wth U upper trgulr d lower trgulr. Theorem 2: d therefore U. dvtge: Every further prolem x c e reduced to Ux Solve two trgulr prolems Uxy d Uxy.

8 Theorem 2: d therefore U : * * * * : for I I I I I I 2 I I I I [ ] I I I I I I I I 2 2

3.2 Vectorzto of GE -form stdrd form: For : - For : l,, /, ; ed For : For :,, l,, ; ed ed ed Vector operto α x r SXPY rows d No GXPY U computed rowwse, columwse. 9

lredy computed, rems uchged, ot used ymore U ewly computed updted every step Stdrd form s lso clled rghtloog GE.

Frst Elmto step: Compute frst colum of Updte

Secod step: 2 Compute secod colum of Updte 2 2

Secod step: 3 Compute thrd colum of Updte 3 3

-st step: U Compute -th colum of Updte 4

Rules for dfferet,, forms: I the followg we g terchge the loops. Necessry codtos: < < Furthermore: Iermost dex,, or determes whether the computto s doe row, colum, or loc-wse. Weghts l hve to e computed efore they re used to elmte relted etres. 5

< -form: < For 2 : For : - l,, /, ; For :,, l,, ; ed ed ed GXPY. 6

lredy computed, ot used y more U lredy computed d prtlly used ewly computed uchged, ot used d U computed rowwse. Compute l,, the SXPY for st d -th row; the l,2 d so o 7

Frst step 8

Secod step 2 9

--st step U - 2

-form: < < For 2 : For 2 : l,-,- / -,- ; For : -,, l,, ; ed ew row Dot product left prt ed For : For : -,, l,, ; ed ed Dot product rght prt ed Compute l, d updte,2 ; the compute l,2 d updte,2 d,3,. ccumultg, 2

< < -form: For 2 : For : l,-,- / -,- ; ed For : - For :,, l,, ; ed ed ed α x r ew colum of GXPY. 22

eft loog GE computed, ot used U lredy computed d used uchged, ot used -, ewly computed 23

Frst step 24

Secod step 25

--st step U 26

vervew ccess to d U ccess to Computt o of U Computt o of Vector operto Vector legth row colum row colum colum colum --------- colum --------- row colum row row row row row colum colum colum colum row row colum colum SXPY SXPY GXPY DT GXPY DT 2/3 2/3 2/3 /3 2/3 /3 Vector legth verge of occurg vector legths 27 ptml form depeds o storge of mtrces d vector legth.