Design of Parallel & tlrgh Performance Computing. about Performance I. Reasoning. Principles. Balance. Roofline Mode -

Size: px
Start display at page:

Download "Design of Parallel & tlrgh Performance Computing. about Performance I. Reasoning. Principles. Balance. Roofline Mode -"

Transcription

1 Balance Design of Parallel & tlrgh Performance Computing about Performance I Reasoning Roofline Mode Principles

2 vector matrix Roofline Model ( Williams et al ) resources in a microarchitecture associated! features Sanelarelth memory B esytyleyche ] deter that found performance program peak performance F tfbnskycle ] work W [ flops ] transferred cache a men Q Estes ] a for given problem min W = work Hime coupled Ly min Q = 110 aplenty Operational intensity I Given a program empty Cade lnturtdou higs I a compute Sound low I > memory sound assume Icu ) Wcw ) QCu ) Example vector Sun y=xty OCD asymptotic founds matrix Icu ) product y=a OG ) on I fast Fourier transform Ocloju ) matrix product C=AB Ocu )

3 Operational Intensity Example Mahir Multiplication Assumptions cache cache 86ek=8 doubles siugkn leade flops We want to estimate Q W=2u3 1 / a B Ca AD A 1 2ns o o to a a 1) Triple loop 2) Blocked 8lb " ( tf # *% entry of C ut8n doubles 1 Slack of C ' 35 ' Er B AD doubles 2 end > ofc same 2 sleek of C same total Sis doubles total 2ns o o o 8 ftp2yddoushs Ica ) = OC i ) bzff )

4 i T Roofline Rodell Williams et al Computer Program men I= W/Q I fbpslsyde ] bandwidth A LLC tsyteskyck ] T = runtime I cycles ] CPU peak perf tfbpskyclei P= WIT ( performance ) [ fbpskyck ] Roofline plot ( example T= 2 A=4 ) 1 Ptfbpskyde ] sound jasedenp fp±pi ) reason =Ww a= 2f ^ Sind 2 r Sounds ( PET ) PZ p I logp a legit leg B ' k I program Isl P 2 A run on Some I input guts ^ > YYYZ 1 2 I I f6ps/j > tif

5 ' Ppenatroud lnhhtidy Upper Lonely daxpy=a ty asymptotic sest exact 0 ( 1 ) OCI ) 2 1/12 Icu ) = War )/Q(u ) y length n It flops / # Sites memo eaehe A BC= nxn a scalar dgemv y Axty OCI ) 04 ) 11/4 fft OCGSU ) Occogy ) dgemm C=ABtC ocn ) o ( of ) E YI should fe y=eade till sharp 3u2 if 828 Example y= KMB =) n E 700 back to slides

6 ) G Balance Principles I Ckuey 86 ) Compete Pegram men Qy data transfer men a LLC s LLC sire j W work = # ops CPU IT The computer is called Salaam if ire compute time = data transfer the assuming perfect utilization = assume it at increases Ese Ip= ; r Jr how A nesalauee? a) A a A set A grows slower than 5! s y x 8 what is?

7 Example I Mahn multiplicities Ca ABTC algorithm worth opkmel I= hot QCTF ) Ip = =o( of ) tat r e?g Example 2 FFT / dorky algorithm ark optimal logo ) Is = hofy@c1gpitsaur ya pots are unnealrttfe

8 tuned there Lou 7 Balance principles I ( Czechowski et al ) coal none detailed principles multi for ones algorithm / architecture adesrgn assess web of HW tends Computer ; Algorithm slow men PRAM W work Di depth P fast men di lately B throughput sing dnansaehon fired Qp each poll IT perf Pp processors Processor is faleuafd ;f Them E Tamp ( " compute men transfers of she A assumptions optimal W W/Q two * set Qp rn fun Qnx? are general fends and some Arnet results " as of a put > F 3 a

9 par Derivation principles 1) ESH mete Tmen Idea did DAG in levels level I o o o a o o q M 4 \o/ levels qz men parsley of tie t level D= o a a 0 9D In each level a A model a +9 8 ' ' ' Tuey X g ( at 9 ) = a D +0 Qt Qpinx 2) Estimably Tap Brent 's theorem Taya CD+ men pan amputee Balance Them a Tamp qn < r a+oyt laa ei+ ) ^ T th men parallelism parallelism algorithm Gupute algorithm

10 2004 Example 1 ; matrix multiplication use Q >cr[ Fp Chou > et al ) Icu ) = 0 ( ITF) stance pwuehple Fr Cp PIE OC P ) it a it resdauee A on ap p " a p p ( A as p ay) and Example 2 sending # FT PF zocigrg ) on a2j Pg ) Back to slrdy

Lecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel

More information

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9 OH BOY! O h Boy!, was or igin a lly cr eat ed in F r en ch an d was a m a jor s u cc ess on t h e Fr en ch st a ge f or young au di enc es. It h a s b een s een by ap pr ox i ma t ely 175,000 sp ect at

More information

A L A BA M A L A W R E V IE W

A L A BA M A L A W R E V IE W A L A BA M A L A W R E V IE W Volume 52 Fall 2000 Number 1 B E F O R E D I S A B I L I T Y C I V I L R I G HT S : C I V I L W A R P E N S I O N S A N D TH E P O L I T I C S O F D I S A B I L I T Y I N

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Parallel Performance Theory - 1

Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis

More information

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun

Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun spcl.inf.ethz.ch @spcl_eth Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017 DPHPC Overview cache coherency memory models 2 Speedup An application

More information

(Group-theoretic) Fast Matrix Multiplication

(Group-theoretic) Fast Matrix Multiplication (Group-theoretic) Fast Matrix Multiplication Ivo Hedtke Data Structures and Efficient Algorithms Group (Prof Dr M Müller-Hannemann) Martin-Luther-University Halle-Wittenberg Institute of Computer Science

More information

A simple Concept for the Performance Analysis of Cluster-Computing

A simple Concept for the Performance Analysis of Cluster-Computing A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University

More information

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of

More information

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s A g la di ou s F. L. 462 E l ec tr on ic D ev el op me nt A i ng er A.W.S. 371 C. A. M. A l ex an de r 236 A d mi ni st ra ti on R. H. (M rs ) A n dr ew s P. V. 326 O p ti ca l Tr an sm is si on A p ps

More information

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode Unit 2 : Software Process O b j ec t i ve This unit introduces software systems engineering through a discussion of software processes and their principal characteristics. In order to achieve the desireable

More information

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on

More information

T h e C S E T I P r o j e c t

T h e C S E T I P r o j e c t T h e P r o j e c t T H E P R O J E C T T A B L E O F C O N T E N T S A r t i c l e P a g e C o m p r e h e n s i v e A s s es s m e n t o f t h e U F O / E T I P h e n o m e n o n M a y 1 9 9 1 1 E T

More information

Definition. special name. 72 spans 5. Last time we proved that if Ju's is. t.vtts.tt turn. This subspace is se important that we give it

Definition. special name. 72 spans 5. Last time we proved that if Ju's is. t.vtts.tt turn. This subspace is se important that we give it 1.7 Spanningsetsilinearndependencey 1.2 in text Last time we proved that if 72 148 Ju's is a set of vectors in R then is a subspace of R t.vtts.tt turn t tz the R This subspace is se important that we

More information

Parallel Performance Theory

Parallel Performance Theory AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia

More information

A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers

A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers H. Kredel 1, H. G. Kruse 1 retired, S. Richling2 1 IT-Center, University of Mannheim, Germany 2 IT-Center,

More information

o Alphabet Recitation

o Alphabet Recitation Letter-Sound Inventory (Record Sheet #1) 5-11 o Alphabet Recitation o Alphabet Recitation a b c d e f 9 h a b c d e f 9 h j k m n 0 p q k m n 0 p q r s t u v w x y z r s t u v w x y z 0 Upper Case Letter

More information

Algorithms and Methods for Fast Model Predictive Control

Algorithms and Methods for Fast Model Predictive Control Algorithms and Methods for Fast Model Predictive Control Technical University of Denmark Department of Applied Mathematics and Computer Science 13 April 2016 Background: Model Predictive Control Model

More information

". :'=: "t',.4 :; :::-':7'- --,r. "c:"" --; : I :. \ 1 :;,'I ~,:-._._'.:.:1... ~~ \..,i ... ~.. ~--~ ( L ;...3L-. ' f.':... I. -.1;':'.

. :'=: t',.4 :; :::-':7'- --,r. c: --; : I :. \ 1 :;,'I ~,:-._._'.:.:1... ~~ \..,i ... ~.. ~--~ ( L ;...3L-. ' f.':... I. -.1;':'. = 47 \ \ L 3L f \ / \ L \ \ j \ \ 6! \ j \ / w j / \ \ 4 / N L5 Dm94 O6zq 9 qmn j!!! j 3DLLE N f 3LLE Of ADL!N RALROAD ORAL OR AL AOAON N 5 5 D D 9 94 4 E ROL 2LL RLLAY RL AY 3 ER OLLL 832 876 8 76 L A

More information

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing IT 204 Section 3.0 Boolean Algebra and Digital Logic Boolean Algebra 2 Logic Equations to Truth Tables X = A. B + A. B + AB A B X 0 0 0 0 3 Sum of Products The OR operation performed on the products of

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, June 24, NASCA 2013, Calais, France Overview We derive

More information

CPU. 60%/yr. Moore s Law. Processor-Memory Performance Gap: (grows 50% / year) DRAM. 7%/yr. DRAM

CPU. 60%/yr. Moore s Law. Processor-Memory Performance Gap: (grows 50% / year) DRAM. 7%/yr. DRAM ecture 1 3 C a ch e B a s i cs a n d C a ch e P erf o rm a n ce Computer Engineering 585 F a l l 2 0 0 2 What Is emory ierarchy typical memory hierarchy today "! '& % ere we focus on 1/2/3 caches and main

More information

R e p u b lic o f th e P h ilip p in e s. R e g io n V II, C e n tra l V isa y a s. C ity o f T a g b ila ran

R e p u b lic o f th e P h ilip p in e s. R e g io n V II, C e n tra l V isa y a s. C ity o f T a g b ila ran R e p u b l f th e P h lp p e D e p rt e t f E d u t R e V, e tr l V y D V N F B H L ty f T b l r Ju ly, D V N M E M R A N D U M N. 0,. L T F E N R H G H H L F F E R N G F R 6 M P L E M E N T A T N T :,

More information

Keywords: Acinonyx jubatus/cheetah/development/diet/hand raising/health/kitten/medication

Keywords: Acinonyx jubatus/cheetah/development/diet/hand raising/health/kitten/medication L V. A W P. Ky: Ayx j//m// ///m A: A y m "My" W P 1986. S y m y y. y mm m. A 6.5 m My.. A { A N D R A S D C T A A T ' } T A K P L A N T A T { - A C A S S T 0 R Y y m T ' 1986' W P - + ' m y, m T y. j-

More information

P a g e 3 6 of R e p o r t P B 4 / 0 9

P a g e 3 6 of R e p o r t P B 4 / 0 9 P a g e 3 6 of R e p o r t P B 4 / 0 9 p r o t e c t h um a n h e a l t h a n d p r o p e r t y fr om t h e d a n g e rs i n h e r e n t i n m i n i n g o p e r a t i o n s s u c h a s a q u a r r y. J

More information

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 5: Performance and Efficiency. James C. Hoe Department of ECE Carnegie Mellon University 18 643 Lecture 5: Performance and Efficiency James C. Hoe Department of ECE Carnegie Mellon University 18 643 F17 L05 S1, James C. Hoe, CMU/ECE/CALCM, 2017 18 643 F17 L05 S2, James C. Hoe, CMU/ECE/CALCM,

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

More Science per Joule: Bottleneck Computing

More Science per Joule: Bottleneck Computing More Science per Joule: Bottleneck Computing Georg Hager Erlangen Regional Computing Center (RRZE) University of Erlangen-Nuremberg Germany PPAM 2013 September 9, 2013 Warsaw, Poland Motivation (1): Scalability

More information

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:

More information

i;\-'i frz q > R>? >tr E*+ [S I z> N g> F 'x sa :r> >,9 T F >= = = I Y E H H>tr iir- g-i I * s I!,i --' - = a trx - H tnz rqx o >.F g< s Ire tr () -s

i;\-'i frz q > R>? >tr E*+ [S I z> N g> F 'x sa :r> >,9 T F >= = = I Y E H H>tr iir- g-i I * s I!,i --' - = a trx - H tnz rqx o >.F g< s Ire tr () -s 5 C /? >9 T > ; '. ; J ' ' J. \ ;\' \.> ). L; c\ u ( (J ) \ 1 ) : C ) (... >\ > 9 e!) T C). '1!\ /_ \ '\ ' > 9 C > 9.' \( T Z > 9 > 5 P + 9 9 ) :> : + (. \ z : ) z cf C : u 9 ( :!z! Z c (! $ f 1 :.1 f.

More information

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013 Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions James Demmel, Andrew Gearhart, Benjamin Lipshitz and Oded Schwartz Electrical Engineering and Computer Sciences University

More information

Actively analyzing performance to find microarchitectural bottlenecks and to estimate performance bounds

Actively analyzing performance to find microarchitectural bottlenecks and to estimate performance bounds hpcgarage.org/isc15 Actively analyzing performance to find microarchitectural bottlenecks and to estimate performance bounds Kenneth (Kent) Czechowski Jee Whan Choi (IBM) Jeff Young Richard (Rich) Vuduc

More information

Genealogy of Pythagorean triangles

Genealogy of Pythagorean triangles Chapter 0 Genealogy of Pythagorean triangles 0. Two ternary trees of rational numbers Consider the rational numbers in the open interval (0, ). Each of these is uniquely in the form q, for relatively prime

More information

qfu fiqrqrot sqr*lrq \rq rrrq frmrs rrwtrq, *Ea qqft-{ f,er ofrq qyfl-{ wtrd u'unflq * *,rqrrd drsq GrrR fr;t - oslogllotsfierft

qfu fiqrqrot sqr*lrq \rq rrrq frmrs rrwtrq, *Ea qqft-{ f,er ofrq qyfl-{ wtrd u'unflq * *,rqrrd drsq GrrR fr;t - oslogllotsfierft fqqt q*q q q fm wtq qyf T mcq fgmf -* : 13/ y j f;t - gfef qfu qq qf fuq q fu *{eq quznq {'q qq fu" {H cc{f cm Tc[ c; qr{ qqem q"tq qf f+** e* qfq * qqf-{ fe fq qyf{ w u'unfq * *qr q GR fq?cq qf fft R'

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

Tests of an internally-fired boiler

Tests of an internally-fired boiler University of Iowa Iowa Research Online Theses and Dissertations 1910 Tests of an internally-fired boiler Ernesto Julio Aguilar State University of Iowa Vincente Camporredondo State University of Iowa

More information

the coordinates of C (3) Find the size of the angle ACB. Give your answer in degrees to 2 decimal places. (4)

the coordinates of C (3) Find the size of the angle ACB. Give your answer in degrees to 2 decimal places. (4) . The line l has equation, 2 4 3 2 + = λ r where λ is a scalar parameter. The line l 2 has equation, 2 0 5 3 9 0 + = µ r where μ is a scalar parameter. Given that l and l 2 meet at the point C, find the

More information

Guide to the Extended Step-Pyramid Periodic Table

Guide to the Extended Step-Pyramid Periodic Table Guide to the Extended Step-Pyramid Periodic Table William B. Jensen Department of Chemistry University of Cincinnati Cincinnati, OH 452201-0172 The extended step-pyramid table recognizes that elements

More information

The Design Procedure. Output Equation Determination - Derive output equations from the state table

The Design Procedure. Output Equation Determination - Derive output equations from the state table The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types

More information

Load-balanced parallel banded-system solvers

Load-balanced parallel banded-system solvers Theoretical Computer Science 289 (2002) 313 334 www.elsevier.com/locate/tcs Load-balanced parallel bed-system solvers Kuo-Liang Chung a; ; 1, Wen-Ming Yan b; 2, Jung-Gen Wu c; 3 a Department of Information

More information

WORLD MATHS DAY ACTIVITY PACK. Ages worldmathsday.com UNICEF WORLD MATHS DAY Lesson Plans Age 4 10 ACTIVITY RESOURCE

WORLD MATHS DAY ACTIVITY PACK. Ages worldmathsday.com UNICEF WORLD MATHS DAY Lesson Plans Age 4 10 ACTIVITY RESOURCE UNICEF AND WORLD MATHS DAY Hp q WORLD MATHS DAY ACTIVITY PACK A 4-10 UNICEF WORLD MATHS DAY 2018 L P A 4 10 ACTIVITY RESOURCE APPENDIX 1 APPENDIX 2 G S---Bx S E f UNICEF WORLD MATHS DAY 2018 L P A 4-10

More information

Advanced Radiology Reporting and Analytics with rscriptor vrad results after 10 million radiology reports

Advanced Radiology Reporting and Analytics with rscriptor vrad results after 10 million radiology reports Av Ry R Ay wh vr 10 y I, w, v, - y y h, v, z yz y. I wh vy y v y hh-qy y z. A h h h N L P (NLP) h w y y. h w w vr Jy 2014 h h h 10 y. F h - w vr hv wk h v h qy wkw. h wh h h. I I /v h h wk wh vy y v w

More information

CSCBNO. Asymmetric Encryption

CSCBNO. Asymmetric Encryption CSCBNO Asymmetric Encryption Posted today issue YHW # 1 use all 3 types HW # 2 If no tell due me soln why Friday Last AES ( also a bunch of Zn ) Since working in Zzsg or 2512 addition is XOR Weakness Need

More information

Performance and Scalability. Lars Karlsson

Performance and Scalability. Lars Karlsson Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix

More information

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi. How to ultiply Slides by Kevin Wayne. Copyright 5 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials Complex ultiplication Complex multiplication. a + bi) c + di) = x + yi.

More information

MULTIPLE PRODUCTS OBJECTIVES. If a i j,b j k,c i k, = + = + = + then a. ( b c) ) 8 ) 6 3) 4 5). If a = 3i j+ k and b 3i j k = = +, then a. ( a b) = ) 0 ) 3) 3 4) not defined { } 3. The scalar a. ( b c)

More information

Saving Energy in Sparse and Dense Linear Algebra Computations

Saving Energy in Sparse and Dense Linear Algebra Computations Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain

More information

energy by deforming and moving. Principle of Work And (c) Zero By substituting at = v(dv/ds) into Ft = mat, the result is

energy by deforming and moving. Principle of Work And (c) Zero By substituting at = v(dv/ds) into Ft = mat, the result is APPLICATIONS CEE 27: Applied Mechanics II, Dynamics Lecture : Ch.4, Sec. 4 Prof. Albert S. Kim Civil and Environmental Engineering, University of Hawaii at Manoa A roller coaster makes use of gravitational

More information

Math 147 Section 3.4. Application Example

Math 147 Section 3.4. Application Example Math 147 Section 3.4 Inverse of a Square Matrix Matrix Equations Determinants of Matrices 1 Application Example Set up the system of equations and then solve it by using an inverse matrix. One safe investment

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Created by T. Madas 2D VECTORS. Created by T. Madas

Created by T. Madas 2D VECTORS. Created by T. Madas 2D VECTORS Question 1 (**) Relative to a fixed origin O, the point A has coordinates ( 2, 3). The point B is such so that AB = 3i 7j, where i and j are mutually perpendicular unit vectors lying on the

More information

/ / MET Day 000 NC1^ INRTL MNVR I E E PRE SLEEP K PRE SLEEP R E

/ / MET Day 000 NC1^ INRTL MNVR I E E PRE SLEEP K PRE SLEEP R E 05//0 5:26:04 09/6/0 (259) 6 7 8 9 20 2 22 2 09/7 0 02 0 000/00 0 02 0 04 05 06 07 08 09 0 2 ay 000 ^ 0 X Y / / / / ( %/ ) 2 /0 2 ( ) ^ 4 / Y/ 2 4 5 6 7 8 9 2 X ^ X % 2 // 09/7/0 (260) ay 000 02 05//0

More information

WORLD MATH DAY ACTIVITY PACK. Ages worldmathsday.com UNICEF WORLD MATH DAY Lesson Plans Age 4 10 ACTIVITY RESOURCE

WORLD MATH DAY ACTIVITY PACK. Ages worldmathsday.com UNICEF WORLD MATH DAY Lesson Plans Age 4 10 ACTIVITY RESOURCE UNICEF AND WORLD MATH DAY Hp qy WORLD MATH DAY ACTIVITY PACK A 4-10 UNICEF WORLD MATH DAY 2018 L P A 4 10 ACTIVITY RESOURCE APPENDIX 1 APPENDIX 2 G S---Bx Sy E f y UNICEF WORLD MATH DAY 2018 L P A 4-10

More information

trawhmmry ffimmf,f;wnt

trawhmmry ffimmf,f;wnt r nsr rwry fff,f;wn My 26, $51 Swe, k "Te Srwberry Cp f e Vr,, c) [ re ers 6 (, r " * f rn ff e # s S,r,* )er*,3n*,.\ ) x 8 2 n v c e 6 r D r, } e ;s 1 :n..< Z r : 66 3 X f; 1r_ X r { j r Z r 1r 3r B s

More information

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea G Blended L ea r ni ng P r o g r a m R eg i o na l C a p a c i t y D ev elo p m ent i n E -L ea r ni ng H R K C r o s s o r d e r u c a t i o n a n d v e l o p m e n t C o p e r a t i o n 3 0 6 0 7 0 5

More information

CMSC 313 Lecture 17 Postulates & Theorems of Boolean Algebra Semiconductors CMOS Logic Gates

CMSC 313 Lecture 17 Postulates & Theorems of Boolean Algebra Semiconductors CMOS Logic Gates CMSC 313 Lecture 17 Postulates & Theorems of Boolean Algebra Semiconductors CMOS Logic Gates UMBC, CMSC313, Richard Chang Last Time Overview of second half of this course Logic gates &

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Toward High Performance Matrix Multiplication for Exact Computation

Toward High Performance Matrix Multiplication for Exact Computation Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing

More information

'NOTAS"CRITICAS PARA UNA TEDRIA DE M BUROCRACIA ESTATAL * Oscar Oszlak

'NOTASCRITICAS PARA UNA TEDRIA DE M BUROCRACIA ESTATAL * Oscar Oszlak OVí "^Ox^ OqAÍ"^ Dcument SD-11 \ 'NOTAS"CRTCAS PARA UNA TEDRA DE M BUROCRACA ESTATAL * Oscr Oszlk * El presente dcument que se reprduce pr us exclusv de ls prtcpntes de curss de Prrms de Cpctcón, se h

More information

const =Λ for simplicity, we assume N =2 L Λ= L = log N; =Λ (depth of arbitration tree) + 1 = O(log N) Λ= Tsize =2 L 1=N 1 =Λ size of arbitration tree

const =Λ for simplicity, we assume N =2 L Λ= L = log N; =Λ (depth of arbitration tree) + 1 = O(log N) Λ= Tsize =2 L 1=N 1 =Λ size of arbitration tree A Space- and Time-efcient Local-spin Spin Lock Yong-Jik Kim and James H. Anderson Department of Computer Science University of North Carolina at Chapel Hill March 2001 Abstract A simple ce transformation

More information

Parts Manual. EPIC II Critical Care Bed REF 2031

Parts Manual. EPIC II Critical Care Bed REF 2031 EPIC II Critical Care Bed REF 2031 Parts Manual For parts or technical assistance call: USA: 1-800-327-0770 2013/05 B.0 2031-109-006 REV B www.stryker.com Table of Contents English Product Labels... 4

More information

Chap. 3 Rigid Bodies: Equivalent Systems of Forces. External/Internal Forces; Equivalent Forces

Chap. 3 Rigid Bodies: Equivalent Systems of Forces. External/Internal Forces; Equivalent Forces Chap. 3 Rigid Bodies: Equivalent Systems of Forces Treatment of a body as a single particle is not always possible. In general, the size of the body and the specific points of application of the forces

More information

The Robustness of Relaxation Rates in Constraint Satisfaction Networks

The Robustness of Relaxation Rates in Constraint Satisfaction Networks Brigham Young University BYU ScholarsArchive All Faculty Publications 1999-07-16 The Robustness of Relaxation Rates in Constraint Satisfaction Networks Tony R. Martinez martinez@cs.byu.edu Dan A. Ventura

More information

Vulnerability Analysis of Feedback Systems. Nathan Woodbury Advisor: Dr. Sean Warnick

Vulnerability Analysis of Feedback Systems. Nathan Woodbury Advisor: Dr. Sean Warnick Vulnerability Analysis of Feedback Systems Nathan Woodbury Advisor: Dr. Sean Warnick Outline : Vulnerability Mathematical Preliminaries Three System Representations & Their Structures Open-Loop Results:

More information

Lab Day and Time: Instructions. 1. Do not open the exam until you are told to start.

Lab Day and Time: Instructions. 1. Do not open the exam until you are told to start. Name: Lab Day and Time: Instructions 1. Do not open the exam until you are told to start. 2. This exam is closed note and closed book. You are not allowed to use any outside material while taking this

More information

I N A C O M P L E X W O R L D

I N A C O M P L E X W O R L D IS L A M I C E C O N O M I C S I N A C O M P L E X W O R L D E x p l o r a t i o n s i n A g-b eanste d S i m u l a t i o n S a m i A l-s u w a i l e m 1 4 2 9 H 2 0 0 8 I s l a m i c D e v e l o p m e

More information

Delsarte s linear programming bound

Delsarte s linear programming bound 15-859 Coding Theory, Fall 14 December 5, 2014 Introduction For all n, q, and d, Delsarte s linear program establishes a series of linear constraints that every code in F n q with distance d must satisfy.

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

ME 230 Kinematics and Dynamics

ME 230 Kinematics and Dynamics ME 230 Kinematics and Dynamics Wei-Chih Wang Department of Mechanical Engineering University of Washington Lecture 8 Kinetics of a particle: Work and Energy (Chapter 14) - 14.1-14.3 W. Wang 2 Kinetics

More information

4/27 Friday. I have all the old homework if you need to collect them.

4/27 Friday. I have all the old homework if you need to collect them. 4/27 Friday Last HW: do not need to turn it. Solution will be posted on the web. I have all the old homework if you need to collect them. Final exam: 7-9pm, Monday, 4/30 at Lambert Fieldhouse F101 Calculator

More information

Chapter 7. Synchronous Sequential Networks. Excitation for

Chapter 7. Synchronous Sequential Networks. Excitation for Chapter 7 Excitation for Synchronous Sequential Networks J. C. Huang, 2004 igital Logic esign 1 Structure of a clocked synchronous sequential network Mealy model of a clocked synchronous sequential network

More information

Vulnerability Analysis of Feedback Systems

Vulnerability Analysis of Feedback Systems Vulnerability Analysis of Feedback Systems Nathan Woodbury Advisor: Dr. Sean Warnick Honors Thesis Defense 11/14/2013 Acknowledgements Advisor Dr. Sean Warnick Honors Committee Dr. Scott Steffensen Dr.

More information

Rough. ) assumptions, proof. Det Let later. statements. string. statements. ( the. di ) by proof. assumption. Oli. logical. Defied. f L. Eton.

Rough. ) assumptions, proof. Det Let later. statements. string. statements. ( the. di ) by proof. assumption. Oli. logical. Defied. f L. Eton. CHPTR Deductions Oli Qi there be formulas formulas Qi i axioms Deductions What do mean by proof? Rough dea : Using some axioms assumptions continue to infer new true statements until arrive at what wanted

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

W * 0 " 4,.' il i I fit A "; E. i I. tot.

W * 0  4,.' il i I fit A ; E. i I. tot. W * 0 il i I fit E. i I. tot. ",.' A "; Contents Questions Set Number Basic Indices Review of Index Laws Negative Indices Scientific Notation Answers Set Number Basic Indices Review of Index Laws Negative

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Georgia Tech High School Math Competition

Georgia Tech High School Math Competition Georgia Tech High School Math Competition Multiple Choice Test February 28, 2015 Each correct answer is worth one point; there is no deduction for incorrect answers. Make sure to enter your ID number on

More information

J. Am. Chem. Soc., 1997, 119(41), , DOI: /ja964223u

J. Am. Chem. Soc., 1997, 119(41), , DOI: /ja964223u J. Am. Chem. Soc., 1997, 119(41), 9624-9631, DOI:1.121/ja964223u Terms & Conditions Electronic Supporting Information files are available without a subscription to ACS Web Editions. The American Chemical

More information

CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions

CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions CSE 502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions 1. Consider the following algorithm: for i := 1 to α n log e n do Pick a random j [1, n]; If a[j] = a[j + 1] or a[j] = a[j 1] then output:

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 VECTORS II. Triple products 2. Differentiation and integration of vectors 3. Equation of a line 4. Equation of a plane.

More information

A Mathematical Model of the Skype VoIP Congestion Control Algorithm

A Mathematical Model of the Skype VoIP Congestion Control Algorithm A Mathematical Model of the Skype VoIP Congestion Control Algorithm Luca De Cicco, S. Mascolo, V. Palmisano Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari 47th IEEE Conference on Decision

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing

More information

F l a s h-b a s e d S S D s i n E n t e r p r i s e F l a s h-b a s e d S S D s ( S o-s ltiad t e D r i v e s ) a r e b e c o m i n g a n a t t r a c

F l a s h-b a s e d S S D s i n E n t e r p r i s e F l a s h-b a s e d S S D s ( S o-s ltiad t e D r i v e s ) a r e b e c o m i n g a n a t t r a c L i f e t i m e M a n a g e m e n t o f F l a-b s ah s e d S S D s U s i n g R e c o v e r-a y w a r e D y n a m i c T h r o t t l i n g S u n g j i n L e, e T a e j i n K i m, K y u n g h o, Kainmd J

More information

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003 Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Timing Issues January 2003 1 Synchronous Timing CLK In R Combinational 1 R Logic 2 C in C out Out 2

More information

Using Kernel Couplings to Predict Parallel Application Performance

Using Kernel Couplings to Predict Parallel Application Performance Using Kernel Couplings to Predict Parallel Application Performance Valerie Taylor, Xingfu Wu, Jonathan Geisler Department of Electrical and Computer Engineering, Northwestern University, Evanston IL 60208

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

Math 412: Number Theory Lecture 26 Gaussian Integers II

Math 412: Number Theory Lecture 26 Gaussian Integers II Math 412: Number Theory Lecture 26 Gaussian Integers II Gexin Yu gyu@wm.edu College of William and Mary Let i = 1. Complex numbers of the form a + bi with a, b Z are called Gaussian integers. Let z = a

More information

Analysis and Design of Sequential Circuits: Examples

Analysis and Design of Sequential Circuits: Examples COSC3410 Analysis and Design of Sequential Circuits: Examples J. C. Huang Department of Computer Science University of Houston Sequential machine slide 1 inputs combinational circuit outputs memory elements

More information

Yale university technical report #1402.

Yale university technical report #1402. The Mailman algorithm: a note on matrix vector multiplication Yale university technical report #1402. Edo Liberty Computer Science Yale University New Haven, CT Steven W. Zucker Computer Science and Appled

More information

Matrix-Matrix Multiplication

Matrix-Matrix Multiplication Week5 Matrix-Matrix Multiplication 51 Opening Remarks 511 Composing Rotations Homework 5111 Which of the following statements are true: cosρ + σ + τ cosτ sinτ cosρ + σ sinρ + σ + τ sinτ cosτ sinρ + σ cosρ

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Inertia Forces in a Reciprocating Engine, Considering the Weight of Connecting Rod.

Inertia Forces in a Reciprocating Engine, Considering the Weight of Connecting Rod. Inertia Forces in a Reciprocating Engine, Considering the Weight of Connecting Rod. We use equivalent mass method. let OC be the crank and PC, the connecting rod whose centre of gravity lies at G. We will

More information