Overview: Parallelisation via Pipelining

Size: px
Start display at page:

Download "Overview: Parallelisation via Pipelining"

Transcription

1 Overview: Parallelisation via Pipelining three type of pipelines adding numbers (type ) performance analysis of pipelines insertion sort (type ) linear system back substitution (type ) Ref: chapter : Wilkinson and Allen COMP00/800 L: Parallelisation via Pipelining 07

2 Pipelining Already encountered instruction pipelining at the CPU level problems that can be divided into a series of sequential tasks that can be completed one after another e.g. a frequency filter in which each process filters one frequency three typical scenarios:. if more than one instance of the complete problem is to be executed. if a series of data items must be processed, each requiring multiple operations. if information to start the next process can be passed forward before the process has completed all its internal operations COMP00/800 L: Parallelisation via Pipelining 07

3 Type Pipelining p m P P P P P P Time COMP00/800 L: Parallelisation via Pipelining 07

4 Type Pipelining Input Sequence d 9 d d d d d P0 P P P P P P6 P7 P8 P9 p n P9 d d d d d P8 d d d d d P7 d d d d d P6 d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P0 d d d d d d 9 Time COMP00/800 L: Parallelisation via Pipelining 07

5 Type Pipelining Information passed to next stage P P P P P P P Even Time P Uneven Time P P P0 P0 Time Time COMP00/800 L: Parallelisation via Pipelining 07

6 Example Type : Adding Numbers s = n i i= Σ ι Σ ι Σ ι Σ ι Σ ι P0 P P P P accumulation = number ; if ( p r o c e s s i d > 0) { recv (& accumulation, p r o c e s s i d ); accumulation = accumulation + number ; } if ( p r o c e s s i d < p ) send (& accumulation, p r o c e s s i d + ); COMP00/800 L: Parallelisation via Pipelining 07 6

7 General Pipeline Analysis assume each process performs a similar action in each pipeline cycle work out computation and communication for a cycle compute the total execution time as: t total = (time for one pipeline cycle)(number of cycles) = (t comp +t comm )(m + p ) where m is the number of instances and p the number of pipeline stages (processes) average time for a computation is then t av = t total m COMP00/800 L: Parallelisation via Pipelining 07 7

8 Summation Analysis single instance: t comp = t f t comm = (t s +t w ) t total = ((t s +t w ) +t f )p = a time complexity of O(p) multiple instances: t total = ((t s +t w ) +t f )(m + p ) t av = t total m (t s +t w ) for m p, t av is one pipeline cycle COMP00/800 L: Parallelisation via Pipelining 07 8

9 Example Type : Insertion Sort Algorithm is like moving a playing card over other cards until correct location found. P0 P P P P,,,,,,,,,, Code: Time (cycles) recv (& number, process id ); if ( number > x ) { send (& x, p r o c e s s i d +); x = number ; } else send (& number, p r o c e s s i d +); assuming n numbers, then process i will 0 receive n i numbers pass on n i numbers COMP00/800 L: Parallelisation via Pipelining 07 9

10 Sort Analysis sequential: t s = (n ) + (n ) = n(n ) i.e. O(n ) not a great sorting algorithm! parallel: each pipeline cycle t comp = t comm = (t s +t w ) total execution time (note: p = n here): t total = (t comp +t comm )(n ) = ( + (t s +t w )(n ) i.e. overall O(n) scaling COMP00/800 L: Parallelisation via Pipelining 07 0

11 Pipelined Insertion Sort Sorting Phase (n ) Returning Phase (n) P P P P P0 Discussion point: using the pipelining idea, we have developed a solution where the number of processing elements matches the number of data items. To what extent is this realistic? Are such algorithms still useful? Time COMP00/800 L: Parallelisation via Pipelining 07

12 Example Type : Linear Equations solve an upper triangular system of linear equations a n,0 x 0 + a n, x + a n, x + + a n,n x n = b n. a,0 x 0 + a, x + a, x = b a,0 x 0 + a, x = b a 0,0 x 0 = b 0 a and b are constants and x are the unknowns to be solved for COMP00/800 L: Parallelisation via Pipelining 07

13 Back Substitution solve for x 0 x 0 = b 0 a 0,0 solve for x using above value for x 0 x = b a,0 x 0 a, solve for x using above values for x and x 0 x = b a,0 x 0 a, x a, etc x i = (b i i a i, j x j )/a i,i j=0 COMP00/800 L: Parallelisation via Pipelining 07

14 Back Substitution: Pipeline Solution P0 P P P x x 0 x 0 x 0 x 0 x Compute x 0 Compute x x Compute x x Compute x x x COMP00/800 L: Parallelisation via Pipelining 07

15 Sequential code: x [0] = b [0]/ a [0][0]; for ( i = ; i < n ; i ++) { sum = 0; for ( j = 0; j < i ; j ++) sum = sum + a [ i ][ j ] x [ j ]; x [ i ] = ( b [ i ] sum )/ a [ i ][ i ]; } Parallel code: i = p r o c e s s i d ; for ( j = 0; j < i ; j ++) { recv (& x [ j ], process id ); send (& x [ j ], p r o c e s s i d +); } sum = 0; for ( j = 0; j < i ; j ++) sum = sum + a [ i ][ j ] x [ j ]; x [ i ] = ( b [ i ] sum )/ a [ i ][ i ]; send (& x [ i ], p r o c e s s i d +); Back Substitution Code COMP00/800 L: Parallelisation via Pipelining 07

16 Back Substitution Time Diagram Processes P P Final value computed P P P P0 First value passed Time COMP00/800 L: Parallelisation via Pipelining 07 6

17 Analysis no longer constant work per pipeline stage! process performs one divide and one send process i performs i sends and receives, i multiply/adds, one division/subtract, and one final send t comm = (i + )(t s +t w ) t comp = i + much harder to analyse! Remark: the systolic array is a pipelined-based architecture. Designs have been used to solve linear systems. COMP00/800 L: Parallelisation via Pipelining 07 7

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Pipelined Computations

Pipelined Computations Chapter 5 Slide 155 Pipelined Computations Pipelined Computations Slide 156 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each

More information

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Based

More information

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu Analysis of Algorithm Efficiency Dr. Yingwu Zhu Measure Algorithm Efficiency Time efficiency How fast the algorithm runs; amount of time required to accomplish the task Our focus! Space efficiency Amount

More information

CSED233: Data Structures (2017F) Lecture4: Analysis of Algorithms

CSED233: Data Structures (2017F) Lecture4: Analysis of Algorithms (2017F) Lecture4: Analysis of Algorithms Daijin Kim CSE, POSTECH dkim@postech.ac.kr Running Time Most algorithms transform input objects into output objects. The running time of an algorithm typically

More information

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Scheduling divisible loads with return messages on heterogeneous master-worker platforms Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont 1, Loris Marchal 2, and Yves Robert 2 1 LaBRI, UMR CNRS 5800, Bordeaux, France Olivier.Beaumont@labri.fr

More information

Data Structures and Algorithms Running time and growth functions January 18, 2018

Data Structures and Algorithms Running time and growth functions January 18, 2018 Data Structures and Algorithms Running time and growth functions January 18, 2018 Measuring Running Time of Algorithms One way to measure the running time of an algorithm is to implement it and then study

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Introduction. How can we say that one algorithm performs better than another? Quantify the resources required to execute:

Introduction. How can we say that one algorithm performs better than another? Quantify the resources required to execute: Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry Spring 2006 1 / 1 Computer Science & Engineering 235 Section 2.3 of Rosen cse235@cse.unl.edu Introduction How can we say that one algorithm

More information

Retiming. delay elements in a circuit without affecting the input/output characteristics of the circuit.

Retiming. delay elements in a circuit without affecting the input/output characteristics of the circuit. Chapter Retiming NCU EE -- SP VLSI esign. Chap. Tsung-Han Tsai 1 Retiming & A transformation techniques used to change the locations of delay elements in a circuit without affecting the input/output characteristics

More information

O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem.

O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem. O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem. defn: v(n) = O(f(n)) constants c and n 0 such that v(n) c f(n) whenever n > n 0 Termed complexity:

More information

Runtime Complexity. CS 331: Data Structures and Algorithms

Runtime Complexity. CS 331: Data Structures and Algorithms Runtime Complexity CS 331: Data Structures and Algorithms So far, our runtime analysis has been based on empirical evidence i.e., runtimes obtained from actually running our algorithms But measured runtime

More information

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Algorithm. Executing the Max algorithm. Algorithm and Growth of Functions Benchaporn Jantarakongkul. (algorithm) ก ก. : ก {a i }=a 1,,a n a i N,

Algorithm. Executing the Max algorithm. Algorithm and Growth of Functions Benchaporn Jantarakongkul. (algorithm) ก ก. : ก {a i }=a 1,,a n a i N, Algorithm and Growth of Functions Benchaporn Jantarakongkul 1 Algorithm (algorithm) ก ก ก ก ก : ก {a i }=a 1,,a n a i N, ก ก : 1. ก v ( v ก ก ก ก ) ก ก a 1 2. ก a i 3. a i >v, ก v ก a i 4. 2. 3. ก ก ก

More information

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Why analysis? We want to predict how the algorithm will behave (e.g. running time) on arbitrary inputs, and how it will

More information

What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt

What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt Lecture 3 The Analysis of Recursive Algorithm Efficiency What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency

More information

Topic 17. Analysis of Algorithms

Topic 17. Analysis of Algorithms Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm

More information

Fundamentals of Programming. Efficiency of algorithms November 5, 2017

Fundamentals of Programming. Efficiency of algorithms November 5, 2017 15-112 Fundamentals of Programming Efficiency of algorithms November 5, 2017 Complexity of sorting algorithms Selection Sort Bubble Sort Insertion Sort Efficiency of Algorithms A computer program should

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis Summary Topics commonly used functions analysis of algorithms experimental asymptotic notation asymptotic analysis big-o big-omega big-theta READING: GT textbook

More information

Module 1: Analyzing the Efficiency of Algorithms

Module 1: Analyzing the Efficiency of Algorithms Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Analytical Modeling of Parallel Programs. S. Oliveira

Analytical Modeling of Parallel Programs. S. Oliveira Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T

More information

Solving Updated Systems of Linear Equations in Parallel

Solving Updated Systems of Linear Equations in Parallel Solving Updated Systems of Linear Equations in Parallel P. Blaznik a and J. Tasic b a Jozef Stefan Institute, Computer Systems Department Jamova 9, 1111 Ljubljana, Slovenia Email: polona.blaznik@ijs.si

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

Program 1 Foundations of Computational Math 1 Fall 2018

Program 1 Foundations of Computational Math 1 Fall 2018 Program 1 Foundations of Computational Math 1 Fall 2018 Due date: 11:59PM on Friday, 28 September 2018 Written Exercises Problem 1 Consider the summation σ = n ξ i using the following binary fan-in tree

More information

Searching. Sorting. Lambdas

Searching. Sorting. Lambdas .. s Babes-Bolyai University arthur@cs.ubbcluj.ro Overview 1 2 3 Feedback for the course You can write feedback at academicinfo.ubbcluj.ro It is both important as well as anonymous Write both what you

More information

The parallelization of the Keller box method on heterogeneous cluster of workstations

The parallelization of the Keller box method on heterogeneous cluster of workstations Available online at http://wwwibnusinautmmy/jfs Journal of Fundamental Sciences Article The parallelization of the Keller box method on heterogeneous cluster of workstations Norhafiza Hamzah*, Norma Alias,

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis 1 Summary Summary analysis of algorithms asymptotic analysis big-o big-omega big-theta asymptotic notation commonly used functions discrete math refresher READING:

More information

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10 Merge Sort 1 / 10 Outline 1 Merging 2 Merge Sort 3 Complexity of Sorting 4 Merge Sort and Other Sorts 2 / 10 Merging Merge sort is based on a simple operation known as merging: combining two ordered arrays

More information

Computational Complexity

Computational Complexity Computational Complexity S. V. N. Vishwanathan, Pinar Yanardag January 8, 016 1 Computational Complexity: What, Why, and How? Intuitively an algorithm is a well defined computational procedure that takes

More information

Mat Week 6. Fall Mat Week 6. Algorithms. Properties. Examples. Searching. Sorting. Time Complexity. Example. Properties.

Mat Week 6. Fall Mat Week 6. Algorithms. Properties. Examples. Searching. Sorting. Time Complexity. Example. Properties. Fall 2013 Student Responsibilities Reading: Textbook, Section 3.1 3.2 Assignments: 1. for sections 3.1 and 3.2 2. Worksheet #4 on Execution s 3. Worksheet #5 on Growth Rates Attendance: Strongly Encouraged

More information

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Analysis, Lund University, 2018 96 Numerical Linear Algebra Unit 8: Condition of a Problem Numerical Analysis, Lund University Claus Führer and Philipp Birken Numerical Analysis, Lund University,

More information

Applications of Mathematical Economics

Applications of Mathematical Economics Applications of Mathematical Economics Michael Curran Trinity College Dublin Overview Introduction. Data Preparation Filters. Dynamic Stochastic General Equilibrium Models: Sunspots and Blanchard-Kahn

More information

Student Responsibilities Week 6. Mat Properties of Algorithms. 3.1 Algorithms. Finding the Maximum Value in a Finite Sequence Pseudocode

Student Responsibilities Week 6. Mat Properties of Algorithms. 3.1 Algorithms. Finding the Maximum Value in a Finite Sequence Pseudocode Student Responsibilities Week 6 Mat 345 Week 6 Reading: Textbook, Section 3.1 3. Assignments: 1. for sections 3.1 and 3.. Worksheet #4 on Execution Times 3. Worksheet #5 on Growth Rates Attendance: Strongly

More information

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes ) CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance

More information

Mapping Sparse Matrix-Vector Multiplication on FPGAs

Mapping Sparse Matrix-Vector Multiplication on FPGAs Mapping Sparse Matrix-Vector Multiplication on FPGAs Junqing Sun 1, Gregory Peterson 1, Olaf Storaasli 2 1 University of Tennessee, Knoxville 2 Oak Ridge National Laboratory July 20, 2007 Outline Introduction

More information

An introduction to parallel algorithms

An introduction to parallel algorithms An introduction to parallel algorithms Knut Mørken Department of Informatics Centre of Mathematics for Applications University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 1/26

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

data structures and algorithms lecture 2

data structures and algorithms lecture 2 data structures and algorithms 2018 09 06 lecture 2 recall: insertion sort Algorithm insertionsort(a, n): for j := 2 to n do key := A[j] i := j 1 while i 1 and A[i] > key do A[i + 1] := A[i] i := i 1 A[i

More information

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

DSP Design Lecture 5. Dr. Fredrik Edman.

DSP Design Lecture 5. Dr. Fredrik Edman. SP esign SP esign Lecture 5 Retiming r. Fredrik Edman fredrik.edman@eit.lth.se Fredrik Edman, ept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se SP esign Repetition Critical

More information

Skip Lists. What is a Skip List. Skip Lists 3/19/14

Skip Lists. What is a Skip List. Skip Lists 3/19/14 Presentation for use with the textbook Data Structures and Algorithms in Java, 6 th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldwasser, Wiley, 2014 Skip Lists 15 15 23 10 15 23 36 Skip Lists

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

1300 Linear Algebra and Vector Geometry Week 2: Jan , Gauss-Jordan, homogeneous matrices, intro matrix arithmetic

1300 Linear Algebra and Vector Geometry Week 2: Jan , Gauss-Jordan, homogeneous matrices, intro matrix arithmetic 1300 Linear Algebra and Vector Geometry Week 2: Jan 14 18 1.2, 1.3... Gauss-Jordan, homogeneous matrices, intro matrix arithmetic R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca Winter 2019 What

More information

Lecture 2. Fundamentals of the Analysis of Algorithm Efficiency

Lecture 2. Fundamentals of the Analysis of Algorithm Efficiency Lecture 2 Fundamentals of the Analysis of Algorithm Efficiency 1 Lecture Contents 1. Analysis Framework 2. Asymptotic Notations and Basic Efficiency Classes 3. Mathematical Analysis of Nonrecursive Algorithms

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Solving large scale eigenvalue problems

Solving large scale eigenvalue problems arge scale eigenvalue problems, Lecture 4, March 14, 2018 1/41 Lecture 4, March 14, 2018: The QR algorithm http://people.inf.ethz.ch/arbenz/ewp/ Peter Arbenz Computer Science Department, ETH Zürich E-mail:

More information

B629 project - StreamIt MPI Backend. Nilesh Mahajan

B629 project - StreamIt MPI Backend. Nilesh Mahajan B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity

More information

Optimal strategies for maintaining a chain of relays between an explorer and a base camp

Optimal strategies for maintaining a chain of relays between an explorer and a base camp 1/55 Optimal strategies for maintaining a chain of relays between an explorer and a base camp Lukas Humbel 2. Mai 2012 2/55 3/55 4/55 5/55 6/55 7/55 8/55 9/55 Outline 10/55 1 Model Definition Problem Statement

More information

3. (2) What is the difference between fixed and hybrid instructions?

3. (2) What is the difference between fixed and hybrid instructions? 1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown

More information

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution

More information

Geographical Information Processing for Cultural Resources

Geographical Information Processing for Cultural Resources Geographical Information Processing for Cultural Resources Assoc. Prof. Hirohisa Mori, Department of Geography, Graduate School of Literature and Human Sciences, Osaka City University 1. What are the Problems?

More information

ASSIGNMENT 1. Due on March 24, 2017 (23:59:59)

ASSIGNMENT 1. Due on March 24, 2017 (23:59:59) ASSIGNMENT 1 Due on March 24, 2017 (23:59:59) Instructions. In this assignment, you will analyze different algorithms and compare their running times. You are expected to measure running times of the algorithms

More information

Solving Systems of Linear Differential Equations with Real Eigenvalues

Solving Systems of Linear Differential Equations with Real Eigenvalues Solving Systems of Linear Differential Equations with Real Eigenvalues David Allen University of Kentucky February 18, 2013 1 Systems with Real Eigenvalues This section shows how to find solutions to linear

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each

More information

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions

More information

1300 Linear Algebra and Vector Geometry

1300 Linear Algebra and Vector Geometry 1300 Linear Algebra and Vector Geometry R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca May-June 2017 Introduction: linear equations Read 1.1 (in the text that is!) Go to course, class webpages.

More information

MPI Implementations for Solving Dot - Product on Heterogeneous Platforms

MPI Implementations for Solving Dot - Product on Heterogeneous Platforms MPI Implementations for Solving Dot - Product on Heterogeneous Platforms Panagiotis D. Michailidis and Konstantinos G. Margaritis Abstract This paper is focused on designing two parallel dot product implementations

More information

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2 Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula

More information

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula

More information

CS 170 Algorithms Fall 2014 David Wagner MT2

CS 170 Algorithms Fall 2014 David Wagner MT2 CS 170 Algorithms Fall 2014 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) Your Student ID number: Your Unix account login: cs170- The room you are sitting in right now: Name of the

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data

More information

Shadows (umbra and penumbra) (Item No.: P )

Shadows (umbra and penumbra) (Item No.: P ) Teacher's/Lecturer's Sheet Shadows (umbra and penumbra) (Item No.: P1063400) Curricular Relevance Area of Expertise: Physik Education Level: Klasse 7-10 Topic: Optik Subtopic: Lichtausbreitung Experiment:

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

Enrico Nardelli Logic Circuits and Computer Architecture

Enrico Nardelli Logic Circuits and Computer Architecture Enrico Nardelli Logic Circuits and Computer Architecture Appendix B The design of VS0: a very simple CPU Rev. 1.4 (2009-10) by Enrico Nardelli B - 1 Instruction set Just 4 instructions LOAD M - Copy into

More information

Algorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser

Algorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser Algorithms and Data Structures Complexity of Algorithms Ulf Leser Content of this Lecture Efficiency of Algorithms Machine Model Complexity Examples Multiplication of two binary numbers (unit cost?) Exact

More information

Lecture 1 Maths for Computer Science. Denis TRYSTRAM Lecture notes MoSIG1. sept. 2017

Lecture 1 Maths for Computer Science. Denis TRYSTRAM Lecture notes MoSIG1. sept. 2017 Lecture 1 Maths for Computer Science Denis TRYSTRAM Lecture notes MoSIG1 sept. 2017 1 / 21 Context The main idea of this preliminary lecture is to show how to obtain some results in Mathematics (in the

More information

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

More information

CISC 235: Topic 1. Complexity of Iterative Algorithms

CISC 235: Topic 1. Complexity of Iterative Algorithms CISC 235: Topic 1 Complexity of Iterative Algorithms Outline Complexity Basics Big-Oh Notation Big-Ω and Big-θ Notation Summations Limitations of Big-Oh Analysis 2 Complexity Complexity is the study of

More information

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of

More information

CE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort)

CE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort) CE 221 Data Structures and Algorithms Chapter 7: Sorting (Insertion Sort, Shellsort) Text: Read Weiss, 7.1 7.4 1 Preliminaries Main memory sorting algorithms All algorithms are Interchangeable; an array

More information

On The Energy Complexity of Parallel Algorithms

On The Energy Complexity of Parallel Algorithms On The Energy Complexity of Parallel Algorithms Vijay Anand Korthikanti Department of Computer Science University of Illinois at Urbana Champaign vkortho2@illinois.edu Gul Agha Department of Computer Science

More information

Component-Based Software Design

Component-Based Software Design Hierarchical Real-Time Scheduling lecture 4/4 March 25, 2015 Outline 1 2 3 4 of computation Given the following resource schedule with period 11 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

More information

Algorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ. Instructor: X. Zhang

Algorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Algorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Last class Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting

More information

LABORATORY MANUAL MICROPROCESSOR AND MICROCONTROLLER

LABORATORY MANUAL MICROPROCESSOR AND MICROCONTROLLER LABORATORY MANUAL S u b j e c t : MICROPROCESSOR AND MICROCONTROLLER TE (E lectr onics) ( S e m V ) 1 I n d e x Serial No T i tl e P a g e N o M i c r o p r o c e s s o r 8 0 8 5 1 8 Bit Addition by Direct

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

CS 4104 Data and Algorithm Analysis. Recurrence Relations. Modeling Recursive Function Cost. Solving Recurrences. Clifford A. Shaffer.

CS 4104 Data and Algorithm Analysis. Recurrence Relations. Modeling Recursive Function Cost. Solving Recurrences. Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2010,2017 by Clifford A. Shaffer Data and Algorithm Analysis Title page Data and Algorithm Analysis Clifford A. Shaffer Spring

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Faster Primal-Dual Algorithms for the Economic Lot-Sizing Problem

Faster Primal-Dual Algorithms for the Economic Lot-Sizing Problem Acknowledgment: Thomas Magnanti, Retsef Levi Faster Primal-Dual Algorithms for the Economic Lot-Sizing Problem Dan Stratila RUTCOR and Rutgers Business School Rutgers University Mihai Pătraşcu AT&T Research

More information

CSCE 222 Discrete Structures for Computing

CSCE 222 Discrete Structures for Computing CSCE 222 Discrete Structures for Computing Algorithms Dr. Philip C. Ritchey Introduction An algorithm is a finite sequence of precise instructions for performing a computation or for solving a problem.

More information

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to

More information

EENG/INFE 212 Stacks

EENG/INFE 212 Stacks EENG/INFE 212 Stacks A stack is an ordered collection of items into which new items may be inserted and from which items may be deleted at one end called the top of the stack. A stack is a dynamic constantly

More information