Overview: Parallelisation via Pipelining
|
|
- Beverly McDaniel
- 5 years ago
- Views:
Transcription
1 Overview: Parallelisation via Pipelining three type of pipelines adding numbers (type ) performance analysis of pipelines insertion sort (type ) linear system back substitution (type ) Ref: chapter : Wilkinson and Allen COMP00/800 L: Parallelisation via Pipelining 07
2 Pipelining Already encountered instruction pipelining at the CPU level problems that can be divided into a series of sequential tasks that can be completed one after another e.g. a frequency filter in which each process filters one frequency three typical scenarios:. if more than one instance of the complete problem is to be executed. if a series of data items must be processed, each requiring multiple operations. if information to start the next process can be passed forward before the process has completed all its internal operations COMP00/800 L: Parallelisation via Pipelining 07
3 Type Pipelining p m P P P P P P Time COMP00/800 L: Parallelisation via Pipelining 07
4 Type Pipelining Input Sequence d 9 d d d d d P0 P P P P P P6 P7 P8 P9 p n P9 d d d d d P8 d d d d d P7 d d d d d P6 d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P d d d d d d 9 P0 d d d d d d 9 Time COMP00/800 L: Parallelisation via Pipelining 07
5 Type Pipelining Information passed to next stage P P P P P P P Even Time P Uneven Time P P P0 P0 Time Time COMP00/800 L: Parallelisation via Pipelining 07
6 Example Type : Adding Numbers s = n i i= Σ ι Σ ι Σ ι Σ ι Σ ι P0 P P P P accumulation = number ; if ( p r o c e s s i d > 0) { recv (& accumulation, p r o c e s s i d ); accumulation = accumulation + number ; } if ( p r o c e s s i d < p ) send (& accumulation, p r o c e s s i d + ); COMP00/800 L: Parallelisation via Pipelining 07 6
7 General Pipeline Analysis assume each process performs a similar action in each pipeline cycle work out computation and communication for a cycle compute the total execution time as: t total = (time for one pipeline cycle)(number of cycles) = (t comp +t comm )(m + p ) where m is the number of instances and p the number of pipeline stages (processes) average time for a computation is then t av = t total m COMP00/800 L: Parallelisation via Pipelining 07 7
8 Summation Analysis single instance: t comp = t f t comm = (t s +t w ) t total = ((t s +t w ) +t f )p = a time complexity of O(p) multiple instances: t total = ((t s +t w ) +t f )(m + p ) t av = t total m (t s +t w ) for m p, t av is one pipeline cycle COMP00/800 L: Parallelisation via Pipelining 07 8
9 Example Type : Insertion Sort Algorithm is like moving a playing card over other cards until correct location found. P0 P P P P,,,,,,,,,, Code: Time (cycles) recv (& number, process id ); if ( number > x ) { send (& x, p r o c e s s i d +); x = number ; } else send (& number, p r o c e s s i d +); assuming n numbers, then process i will 0 receive n i numbers pass on n i numbers COMP00/800 L: Parallelisation via Pipelining 07 9
10 Sort Analysis sequential: t s = (n ) + (n ) = n(n ) i.e. O(n ) not a great sorting algorithm! parallel: each pipeline cycle t comp = t comm = (t s +t w ) total execution time (note: p = n here): t total = (t comp +t comm )(n ) = ( + (t s +t w )(n ) i.e. overall O(n) scaling COMP00/800 L: Parallelisation via Pipelining 07 0
11 Pipelined Insertion Sort Sorting Phase (n ) Returning Phase (n) P P P P P0 Discussion point: using the pipelining idea, we have developed a solution where the number of processing elements matches the number of data items. To what extent is this realistic? Are such algorithms still useful? Time COMP00/800 L: Parallelisation via Pipelining 07
12 Example Type : Linear Equations solve an upper triangular system of linear equations a n,0 x 0 + a n, x + a n, x + + a n,n x n = b n. a,0 x 0 + a, x + a, x = b a,0 x 0 + a, x = b a 0,0 x 0 = b 0 a and b are constants and x are the unknowns to be solved for COMP00/800 L: Parallelisation via Pipelining 07
13 Back Substitution solve for x 0 x 0 = b 0 a 0,0 solve for x using above value for x 0 x = b a,0 x 0 a, solve for x using above values for x and x 0 x = b a,0 x 0 a, x a, etc x i = (b i i a i, j x j )/a i,i j=0 COMP00/800 L: Parallelisation via Pipelining 07
14 Back Substitution: Pipeline Solution P0 P P P x x 0 x 0 x 0 x 0 x Compute x 0 Compute x x Compute x x Compute x x x COMP00/800 L: Parallelisation via Pipelining 07
15 Sequential code: x [0] = b [0]/ a [0][0]; for ( i = ; i < n ; i ++) { sum = 0; for ( j = 0; j < i ; j ++) sum = sum + a [ i ][ j ] x [ j ]; x [ i ] = ( b [ i ] sum )/ a [ i ][ i ]; } Parallel code: i = p r o c e s s i d ; for ( j = 0; j < i ; j ++) { recv (& x [ j ], process id ); send (& x [ j ], p r o c e s s i d +); } sum = 0; for ( j = 0; j < i ; j ++) sum = sum + a [ i ][ j ] x [ j ]; x [ i ] = ( b [ i ] sum )/ a [ i ][ i ]; send (& x [ i ], p r o c e s s i d +); Back Substitution Code COMP00/800 L: Parallelisation via Pipelining 07
16 Back Substitution Time Diagram Processes P P Final value computed P P P P0 First value passed Time COMP00/800 L: Parallelisation via Pipelining 07 6
17 Analysis no longer constant work per pipeline stage! process performs one divide and one send process i performs i sends and receives, i multiply/adds, one division/subtract, and one final send t comm = (i + )(t s +t w ) t comp = i + much harder to analyse! Remark: the systolic array is a pipelined-based architecture. Designs have been used to solve linear systems. COMP00/800 L: Parallelisation via Pipelining 07 7
Overview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationPipelined Computations
Chapter 5 Slide 155 Pipelined Computations Pipelined Computations Slide 156 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each
More informationBarrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers
Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationSolution of Linear Systems
Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationData Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples
Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Based
More informationAnalysis of Algorithm Efficiency. Dr. Yingwu Zhu
Analysis of Algorithm Efficiency Dr. Yingwu Zhu Measure Algorithm Efficiency Time efficiency How fast the algorithm runs; amount of time required to accomplish the task Our focus! Space efficiency Amount
More informationCSED233: Data Structures (2017F) Lecture4: Analysis of Algorithms
(2017F) Lecture4: Analysis of Algorithms Daijin Kim CSE, POSTECH dkim@postech.ac.kr Running Time Most algorithms transform input objects into output objects. The running time of an algorithm typically
More informationScheduling divisible loads with return messages on heterogeneous master-worker platforms
Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont 1, Loris Marchal 2, and Yves Robert 2 1 LaBRI, UMR CNRS 5800, Bordeaux, France Olivier.Beaumont@labri.fr
More informationData Structures and Algorithms Running time and growth functions January 18, 2018
Data Structures and Algorithms Running time and growth functions January 18, 2018 Measuring Running Time of Algorithms One way to measure the running time of an algorithm is to implement it and then study
More informationICS 233 Computer Architecture & Assembly Language
ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by
More informationCMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationIntroduction. How can we say that one algorithm performs better than another? Quantify the resources required to execute:
Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry Spring 2006 1 / 1 Computer Science & Engineering 235 Section 2.3 of Rosen cse235@cse.unl.edu Introduction How can we say that one algorithm
More informationRetiming. delay elements in a circuit without affecting the input/output characteristics of the circuit.
Chapter Retiming NCU EE -- SP VLSI esign. Chap. Tsung-Han Tsai 1 Retiming & A transformation techniques used to change the locations of delay elements in a circuit without affecting the input/output characteristics
More informationO Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem.
O Notation (Big Oh) We want to give an upper bound on the amount of time it takes to solve a problem. defn: v(n) = O(f(n)) constants c and n 0 such that v(n) c f(n) whenever n > n 0 Termed complexity:
More informationRuntime Complexity. CS 331: Data Structures and Algorithms
Runtime Complexity CS 331: Data Structures and Algorithms So far, our runtime analysis has been based on empirical evidence i.e., runtimes obtained from actually running our algorithms But measured runtime
More informationCSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms
Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationAlgorithm. Executing the Max algorithm. Algorithm and Growth of Functions Benchaporn Jantarakongkul. (algorithm) ก ก. : ก {a i }=a 1,,a n a i N,
Algorithm and Growth of Functions Benchaporn Jantarakongkul 1 Algorithm (algorithm) ก ก ก ก ก : ก {a i }=a 1,,a n a i N, ก ก : 1. ก v ( v ก ก ก ก ) ก ก a 1 2. ก a i 3. a i >v, ก v ก a i 4. 2. 3. ก ก ก
More informationAnalysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College
Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Why analysis? We want to predict how the algorithm will behave (e.g. running time) on arbitrary inputs, and how it will
More informationWhat we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt
Lecture 3 The Analysis of Recursive Algorithm Efficiency What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency
More informationTopic 17. Analysis of Algorithms
Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm
More informationFundamentals of Programming. Efficiency of algorithms November 5, 2017
15-112 Fundamentals of Programming Efficiency of algorithms November 5, 2017 Complexity of sorting algorithms Selection Sort Bubble Sort Insertion Sort Efficiency of Algorithms A computer program should
More informationcsci 210: Data Structures Program Analysis
csci 210: Data Structures Program Analysis Summary Topics commonly used functions analysis of algorithms experimental asymptotic notation asymptotic analysis big-o big-omega big-theta READING: GT textbook
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationAnalytical Modeling of Parallel Programs. S. Oliveira
Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T
More informationSolving Updated Systems of Linear Equations in Parallel
Solving Updated Systems of Linear Equations in Parallel P. Blaznik a and J. Tasic b a Jozef Stefan Institute, Computer Systems Department Jamova 9, 1111 Ljubljana, Slovenia Email: polona.blaznik@ijs.si
More informationLinear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4
Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix
More informationProgram 1 Foundations of Computational Math 1 Fall 2018
Program 1 Foundations of Computational Math 1 Fall 2018 Due date: 11:59PM on Friday, 28 September 2018 Written Exercises Problem 1 Consider the summation σ = n ξ i using the following binary fan-in tree
More informationSearching. Sorting. Lambdas
.. s Babes-Bolyai University arthur@cs.ubbcluj.ro Overview 1 2 3 Feedback for the course You can write feedback at academicinfo.ubbcluj.ro It is both important as well as anonymous Write both what you
More informationThe parallelization of the Keller box method on heterogeneous cluster of workstations
Available online at http://wwwibnusinautmmy/jfs Journal of Fundamental Sciences Article The parallelization of the Keller box method on heterogeneous cluster of workstations Norhafiza Hamzah*, Norma Alias,
More informationcsci 210: Data Structures Program Analysis
csci 210: Data Structures Program Analysis 1 Summary Summary analysis of algorithms asymptotic analysis big-o big-omega big-theta asymptotic notation commonly used functions discrete math refresher READING:
More informationOutline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10
Merge Sort 1 / 10 Outline 1 Merging 2 Merge Sort 3 Complexity of Sorting 4 Merge Sort and Other Sorts 2 / 10 Merging Merge sort is based on a simple operation known as merging: combining two ordered arrays
More informationComputational Complexity
Computational Complexity S. V. N. Vishwanathan, Pinar Yanardag January 8, 016 1 Computational Complexity: What, Why, and How? Intuitively an algorithm is a well defined computational procedure that takes
More informationMat Week 6. Fall Mat Week 6. Algorithms. Properties. Examples. Searching. Sorting. Time Complexity. Example. Properties.
Fall 2013 Student Responsibilities Reading: Textbook, Section 3.1 3.2 Assignments: 1. for sections 3.1 and 3.2 2. Worksheet #4 on Execution s 3. Worksheet #5 on Growth Rates Attendance: Strongly Encouraged
More informationLogic and Computer Design Fundamentals. Chapter 8 Sequencing and Control
Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing
More informationNumerical Linear Algebra
Numerical Analysis, Lund University, 2018 96 Numerical Linear Algebra Unit 8: Condition of a Problem Numerical Analysis, Lund University Claus Führer and Philipp Birken Numerical Analysis, Lund University,
More informationApplications of Mathematical Economics
Applications of Mathematical Economics Michael Curran Trinity College Dublin Overview Introduction. Data Preparation Filters. Dynamic Stochastic General Equilibrium Models: Sunspots and Blanchard-Kahn
More informationStudent Responsibilities Week 6. Mat Properties of Algorithms. 3.1 Algorithms. Finding the Maximum Value in a Finite Sequence Pseudocode
Student Responsibilities Week 6 Mat 345 Week 6 Reading: Textbook, Section 3.1 3. Assignments: 1. for sections 3.1 and 3.. Worksheet #4 on Execution Times 3. Worksheet #5 on Growth Rates Attendance: Strongly
More informationCSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )
CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance
More informationMapping Sparse Matrix-Vector Multiplication on FPGAs
Mapping Sparse Matrix-Vector Multiplication on FPGAs Junqing Sun 1, Gregory Peterson 1, Olaf Storaasli 2 1 University of Tennessee, Knoxville 2 Oak Ridge National Laboratory July 20, 2007 Outline Introduction
More informationAn introduction to parallel algorithms
An introduction to parallel algorithms Knut Mørken Department of Informatics Centre of Mathematics for Applications University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 1/26
More information6. Iterative Methods for Linear Systems. The stepwise approach to the solution...
6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse
More informationdata structures and algorithms lecture 2
data structures and algorithms 2018 09 06 lecture 2 recall: insertion sort Algorithm insertionsort(a, n): for j := 2 to n do key := A[j] i := j 1 while i 1 and A[i] > key do A[i + 1] := A[i] i := i 1 A[i
More informationECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)
ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC
More informationDSP Design Lecture 5. Dr. Fredrik Edman.
SP esign SP esign Lecture 5 Retiming r. Fredrik Edman fredrik.edman@eit.lth.se Fredrik Edman, ept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se SP esign Repetition Critical
More informationSkip Lists. What is a Skip List. Skip Lists 3/19/14
Presentation for use with the textbook Data Structures and Algorithms in Java, 6 th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldwasser, Wiley, 2014 Skip Lists 15 15 23 10 15 23 36 Skip Lists
More informationWhat s the Deal? MULTIPLICATION. Time to multiply
What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products
More information1300 Linear Algebra and Vector Geometry Week 2: Jan , Gauss-Jordan, homogeneous matrices, intro matrix arithmetic
1300 Linear Algebra and Vector Geometry Week 2: Jan 14 18 1.2, 1.3... Gauss-Jordan, homogeneous matrices, intro matrix arithmetic R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca Winter 2019 What
More informationLecture 2. Fundamentals of the Analysis of Algorithm Efficiency
Lecture 2 Fundamentals of the Analysis of Algorithm Efficiency 1 Lecture Contents 1. Analysis Framework 2. Asymptotic Notations and Basic Efficiency Classes 3. Mathematical Analysis of Nonrecursive Algorithms
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationSolving large scale eigenvalue problems
arge scale eigenvalue problems, Lecture 4, March 14, 2018 1/41 Lecture 4, March 14, 2018: The QR algorithm http://people.inf.ethz.ch/arbenz/ewp/ Peter Arbenz Computer Science Department, ETH Zürich E-mail:
More informationB629 project - StreamIt MPI Backend. Nilesh Mahajan
B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationCOMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions
COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity
More informationOptimal strategies for maintaining a chain of relays between an explorer and a base camp
1/55 Optimal strategies for maintaining a chain of relays between an explorer and a base camp Lukas Humbel 2. Mai 2012 2/55 3/55 4/55 5/55 6/55 7/55 8/55 9/55 Outline 10/55 1 Model Definition Problem Statement
More information3. (2) What is the difference between fixed and hybrid instructions?
1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown
More informationCS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University
CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution
More informationGeographical Information Processing for Cultural Resources
Geographical Information Processing for Cultural Resources Assoc. Prof. Hirohisa Mori, Department of Geography, Graduate School of Literature and Human Sciences, Osaka City University 1. What are the Problems?
More informationASSIGNMENT 1. Due on March 24, 2017 (23:59:59)
ASSIGNMENT 1 Due on March 24, 2017 (23:59:59) Instructions. In this assignment, you will analyze different algorithms and compare their running times. You are expected to measure running times of the algorithms
More informationSolving Systems of Linear Differential Equations with Real Eigenvalues
Solving Systems of Linear Differential Equations with Real Eigenvalues David Allen University of Kentucky February 18, 2013 1 Systems with Real Eigenvalues This section shows how to find solutions to linear
More informationLecture 13: Sequential Circuits, FSM
Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each
More informationCS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More information1300 Linear Algebra and Vector Geometry
1300 Linear Algebra and Vector Geometry R. Craigen Office: MH 523 Email: craigenr@umanitoba.ca May-June 2017 Introduction: linear equations Read 1.1 (in the text that is!) Go to course, class webpages.
More informationMPI Implementations for Solving Dot - Product on Heterogeneous Platforms
MPI Implementations for Solving Dot - Product on Heterogeneous Platforms Panagiotis D. Michailidis and Konstantinos G. Margaritis Abstract This paper is focused on designing two parallel dot product implementations
More informationComputer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2
Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula
More informationComputer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2
Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula
More informationCS 170 Algorithms Fall 2014 David Wagner MT2
CS 170 Algorithms Fall 2014 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) Your Student ID number: Your Unix account login: cs170- The room you are sitting in right now: Name of the
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationVLSI Signal Processing
VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data
More informationShadows (umbra and penumbra) (Item No.: P )
Teacher's/Lecturer's Sheet Shadows (umbra and penumbra) (Item No.: P1063400) Curricular Relevance Area of Expertise: Physik Education Level: Klasse 7-10 Topic: Optik Subtopic: Lichtausbreitung Experiment:
More informationSpecial Nodes for Interface
fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationEnrico Nardelli Logic Circuits and Computer Architecture
Enrico Nardelli Logic Circuits and Computer Architecture Appendix B The design of VS0: a very simple CPU Rev. 1.4 (2009-10) by Enrico Nardelli B - 1 Instruction set Just 4 instructions LOAD M - Copy into
More informationAlgorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser
Algorithms and Data Structures Complexity of Algorithms Ulf Leser Content of this Lecture Efficiency of Algorithms Machine Model Complexity Examples Multiplication of two binary numbers (unit cost?) Exact
More informationLecture 1 Maths for Computer Science. Denis TRYSTRAM Lecture notes MoSIG1. sept. 2017
Lecture 1 Maths for Computer Science Denis TRYSTRAM Lecture notes MoSIG1 sept. 2017 1 / 21 Context The main idea of this preliminary lecture is to show how to obtain some results in Mathematics (in the
More informationPipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2
Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
More informationCISC 235: Topic 1. Complexity of Iterative Algorithms
CISC 235: Topic 1 Complexity of Iterative Algorithms Outline Complexity Basics Big-Oh Notation Big-Ω and Big-θ Notation Summations Limitations of Big-Oh Analysis 2 Complexity Complexity is the study of
More informationLecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of
More informationCE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort)
CE 221 Data Structures and Algorithms Chapter 7: Sorting (Insertion Sort, Shellsort) Text: Read Weiss, 7.1 7.4 1 Preliminaries Main memory sorting algorithms All algorithms are Interchangeable; an array
More informationOn The Energy Complexity of Parallel Algorithms
On The Energy Complexity of Parallel Algorithms Vijay Anand Korthikanti Department of Computer Science University of Illinois at Urbana Champaign vkortho2@illinois.edu Gul Agha Department of Computer Science
More informationComponent-Based Software Design
Hierarchical Real-Time Scheduling lecture 4/4 March 25, 2015 Outline 1 2 3 4 of computation Given the following resource schedule with period 11 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
More informationAlgorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ. Instructor: X. Zhang
Algorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Last class Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive
More informationParallel Numerics. Scope: Revise standard numerical methods considering parallel computations!
Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:
More informationLecture 4: Linear Algebra 1
Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting
More informationLABORATORY MANUAL MICROPROCESSOR AND MICROCONTROLLER
LABORATORY MANUAL S u b j e c t : MICROPROCESSOR AND MICROCONTROLLER TE (E lectr onics) ( S e m V ) 1 I n d e x Serial No T i tl e P a g e N o M i c r o p r o c e s s o r 8 0 8 5 1 8 Bit Addition by Direct
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationCS 4104 Data and Algorithm Analysis. Recurrence Relations. Modeling Recursive Function Cost. Solving Recurrences. Clifford A. Shaffer.
Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2010,2017 by Clifford A. Shaffer Data and Algorithm Analysis Title page Data and Algorithm Analysis Clifford A. Shaffer Spring
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationFaster Primal-Dual Algorithms for the Economic Lot-Sizing Problem
Acknowledgment: Thomas Magnanti, Retsef Levi Faster Primal-Dual Algorithms for the Economic Lot-Sizing Problem Dan Stratila RUTCOR and Rutgers Business School Rutgers University Mihai Pătraşcu AT&T Research
More informationCSCE 222 Discrete Structures for Computing
CSCE 222 Discrete Structures for Computing Algorithms Dr. Philip C. Ritchey Introduction An algorithm is a finite sequence of precise instructions for performing a computation or for solving a problem.
More informationAntonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg
INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The
More informationLoop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1
Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt
More informationSDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version
SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to
More informationEENG/INFE 212 Stacks
EENG/INFE 212 Stacks A stack is an ordered collection of items into which new items may be inserted and from which items may be deleted at one end called the top of the stack. A stack is a dynamic constantly
More information