Program Performance Metrics

Size: px
Start display at page:

Download "Program Performance Metrics"

Transcription

1 Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the ratio of the time needed to solve the problem on a single processor (seq) to the time required to solve the same problem on parallel system with "p" processors (par) relative - seq is the execution time of parallel algorithm executing on one of the processors of parallel computer real - seq is the execution time for the best-know algorithm using one of the processors of parallel computer absolute - seq is the execution time for the best-know algorithm using the best-know computer 1 1

2 Program Performance Metrics he efficiency (E) of parallel program is defined as a ratio of speedup to the number of processors he cost is usually defined as a product of a parallel run time and the number of processors he scalability of parallel system is a measure of its capacity to increase speedup in proportion to the number of processors

3 Communication costs in static Interconnection networks Principal parameters - startup time (t s ) - per-hop time (t h ) - per-word transfer time (t w ) Routing techniques - store-and-forward routing - cut-through routing

4 Communication costs depends on routing strategy Store and forward routing - the message is sending between different processors and each intermediate processor store it in the local memory until received the whole message ( ) tcomm t s mtw th l Cut-through routing - the message is divided on parts which are sending between processors without waiting for the whole message tcomm ts lth mtw

5 Basic communication operations -Simple message transfer between two processors -One-to-all broadcast -All-to-all broadcast -One-to-all personalized communication -All-to-all personalized communication - Circular shift 5

6 One-to-all broadcast M M M M 0 1 p-1 Single-node accumulation 0 1 p-1 M p-1 M p-1 M p-1 All-to-all broadcast M 1 M 0 M 1 M p-1 M p p-1 Multinode accumulation M 1 M 0 M 1 M 0 M p-1 One-to-all personalized M 1 M 0 M 0 M 1 M p p p-1 Single-node gather M 0,p-1 M 0,1 M 0,0 6 M 1,p-1 M 1,1 M 1,0 M p-1,p-1 M p-1,1 M p-1,0 All-to-all personalized M p-1,0 M 1,0 M 0,0 M p-1,1 M 1,1 M 0,1 M p-1.p p p-1 Multinode gather M 1,p-1 M 0,p-1

7 One-to-all broadcast - SF a) in a ring with even number of procesors b) in a ring with odd number of procesors. one _ to_ all_ b t t m 7 7 s w p

8 8 8 One-to-all broadcast - SF in a mesh with wraparound p m t t w s b all to one _

9 One-to-all broadcast - SF (110) (111) 6 7 (010) (011) 1 5 (100) (101) 0 1 (000) (001) in a hypercube one_ to_ all_ b t t mlog p s w 9 9

10 One-to-all broadcast - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} {1} {1} {1} {15} {16} {17} {18} {19} procedure ONE_O_ALL_BC(d,my_id,X); begin mask:= d -1; for i:=d-1 downto 0 do begin mask:=mask XOR i ; if (my_id AND mask)=0 then if (my_id AND i )=0 then begin msg_destination:=my_id XOR i ; send X to msg_destination; endif else begin msg_source:=my_id XOR i ; receive X from msg_source; endelse; endfor; end ONE_O_ALL_BC A code of one to all broadcast operation in hypercube (processor with label 0 is broadcasting its message)

11 One-to-all broadcast - C in a ring onetoallbc t s log p t w mlog p t h p 1 11

12 One-to-all broadcast - C in a mesh with wraparound onetoallbc t t mlog p t p 1 s w h 1 1

13 One-to-all broadcast - C in a balanced binary tree onetoallbc t t mt log p 1 log p s w h 1 1

14 7. communication step.... communication step 1. communication step All-to-all broadcast - SF 1(7) 1(6) 1(5) () (7) (6) (5) () (0) (1) () () 1() 1(0) 1(1) 1() (6) (5) () () (6,7) (5,6) (,5) (,) (0,7) (0,1) (1,) (,) () (7) (0) (1) 7(1) 7(0) (7) 7(6) (1..7) (0..6) (0..5,7) (0..,6,7) (0,..7) (0,1,..7) (0..,..7) (0..,5..7) 7(5) 7() 7() 7() 1 1

15 All-to-all broadcast - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} {1} {1} procedure ALL_O_ALL_BC_RING(my_id,my_msg,p,result); begin left:=(my_id - 1) mod p; right:=(my_id + 1) mod p; result:=my_msg; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; end ALL_O_ALL_BC_RING; alltoallbc t t m p 1 s w 15 15

16 All-to-all broadcast - SF (1) (1) (1) (15) 11 (8) (9) (10) (11) () (5) (6) (7) (0) (1) () () 7 1 (1..15) 1 1 (1..15) (1..15) 15 (1..15) 8 (8..11) 9 10 (8..11) (8..11) 11 (8..11) (..7) 5 6 (..7) (..7) 7 (..7) 0 1 (0..) (0..) (0..) (0..) 16 16

17 All-to-all broadcast - SF procedure ALL_O_ALL_BC_MESH(my_id,my_msg,p,result); begin left:= {...}; right:=(...}; result:=my_msg; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; left:= {...}; right:=(...}; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; end ALL_O_ALL_BC_MESH; p 1 t mp 1 t alltoallbc s w 17 17

18 All-to-all broadcast - SF () (0) 0 () (6) (7) 6 1 () (1) 7 5 (,) (5) (0,1) 0 (6,7) (6,7) 6 (,5) 1 (,) (0,1) a) Initial distribution of messages b) Distribusion before the second step 7 5 (,5) (..7) (..7) 6 7 (0..7) (0..7) 6 7 (0..) (0..) (0..7) (0..7) (0..) 0 (..7) 1 (0..) 5 (..7) (0..7) 0 (0..7) 1 (0..7) 5 (0..7) c) Distribusion before the third step d) Final distribusion of messages 18 alltoallbc t s log p t m p 1 w

19 All-to-all broadcast with reduction - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} procedure ALL_O_ALL_BC_HCUBE(my_id,my_msg,d,result); begin result:=my_msg; for i:=1 to d-1 do begin partner:=my_id XOR i ; send result to partner; receive msg from partner; result:=result msg; endfor; end ALL_O_ALL_BC_HCUBE; alltoallbc t t mlog p s w 19 19

20 7. communication step.... communication step 1. communication step onetoall pers One-to-all personalized - SF t t m p 1 s w (7) (6) (7) 7(7) (6) 7(5) 0 1 7() 7(1) 7() 7() 0 0

21 One-to-all personalized - SF (1..15) (8..11) (..7) (0..) (1) (1) (1) (15) 11 (8) (9) (10) (11) () (5) (6) (7) (0) (1) () () p 1 t mp 1 t onetoall pers s w 1 1

22 All-to-all personalized - SF (5,0) (5,1) (5,) (5,) (5,) (,0) (,1) (,) (,) (,5) (0,1) (0,) (0,) (0,) (0,5) (,0) (,1) (,) (,) (,5) (1,0) (1,) (1,) (1,) (1,5) (,0) (,1) (,) (,) (,5) 1. communication step (,0) (,1) (,) (,) (,0) (,1) (,) 5 (,5) (,5) 0 1. communication step (5,1) (0,) (5,) (0,) (5,) (0,) (5,) (0,5) (,0) (,1) (,) (1,0) (1,) (1,) (1,5)

23 All-to-all personalized - SF (,0) (,1) (,) (,0) (,1) (1,0) (1,) 5 (,5) (1,5) 0 1 (0,) (0,) (0,5). communication step (,1) (,) (,) (5,) (5,) (5,) (,0) (,1) (1,0) (1,5) (0,) (0,5) (,1) (,) (,) (,) (5,) (5,). communication step (0,5) 5 (1,0) 5. (,) communication 0 1 step (,1) (,) (5,)

24 All-to-all personalized - SF 6 (6,0),(6,),(6,6) (6,1),(6,),(6,7) (6,),(6,5),(6,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) 7 (7,0),(7,),(7,6) (7,1),(7,),(7,7) (7,),(7,5),(7,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) 8 5 (8,0),(8,),(8,6) (8,1),(8,),(8,7) (8,),(8,5),(8,8) (5,0),(5,),(5,6) (5,1),(5,),(5,7) (5,),(5,5),(5,8) 0 (0,0),(0,),(0,6) (0,1),(0,),(0,7) (0,),(0,5),(0,8) 1 (1,0),(1,),(1,6) (1,1),(1,),(1,7) (1,),(1,5),(1,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8)

25 All-to-all personalized - SF 6 (6,0),(6,),(6,6) 7 (6,1),(6,),(6,7) 8 (6,),(6,5),(6,8) (7,0),(7,),(7,6) (7,1),(7,),(7,7) (7,),(7,5),(7,8) (8,0),(8,),(8,6) (8,1),(8,),(8,7) (8,),(8,5),(8,8) (,0),(,),(,6) (,1),(,),(,7) 5 (,),(,5),(,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) (5,0),(5,),(5,6) (5,1),(5,),(5,7) (5,),(5,5),(5,8) 0 1 (0,0),(0,),(0,6) (0,1),(0,),(0,7) (0,),(0,5),(0,8) (1,0),(1,),(1,6) (1,1),(1,),(1,7) (1,),(1,5),(1,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) t t mp p 1 alltoall pers s w 5 5

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all

More information

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may

More information

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)

FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461) FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht Isoefficiency Analysis of Parallel QMR-Like Iterative Methods and its Implications

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

A Nested Dissection Parallel Direct Solver. for Simulations of 3D DC/AC Resistivity. Measurements. Maciej Paszyński (1,2)

A Nested Dissection Parallel Direct Solver. for Simulations of 3D DC/AC Resistivity. Measurements. Maciej Paszyński (1,2) A Nested Dissection Parallel Direct Solver for Simulations of 3D DC/AC Resistivity Measurements Maciej Paszyński (1,2) David Pardo (2), Carlos Torres-Verdín (2) (1) Department of Computer Science, AGH

More information

EECS 358 Introduction to Parallel Computing Final Assignment

EECS 358 Introduction to Parallel Computing Final Assignment EECS 358 Introduction to Parallel Computing Final Assignment Jiangtao Gou Zhenyu Zhao March 19, 2013 1 Problem 1 1.1 Matrix-vector Multiplication on Hypercube and Torus As shown in slide 15.11, we assumed

More information

1 / 28. Parallel Programming.

1 / 28. Parallel Programming. 1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:

More information

On Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes

On Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes American Journal of Applied Sciences 5(11): 1605-1610, 2008 ISSN 1546-9239 2008 Science Publications On Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes 1 Ioana Zelina, 2 Grigor Moldovan

More information

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms

CSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures

More information

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes ) CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance

More information

Distributed Systems Byzantine Agreement

Distributed Systems Byzantine Agreement Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Tight Bounds on the Ratio of Network Diameter to Average Internode Distance. Behrooz Parhami University of California, Santa Barbara

Tight Bounds on the Ratio of Network Diameter to Average Internode Distance. Behrooz Parhami University of California, Santa Barbara Tight Bounds on the Ratio of Network Diameter to Average Internode Distance Behrooz Parhami University of California, Santa Barbara About This Presentation This slide show was first developed in fall of

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Lecture 4. Writing parallel programs with MPI Measuring performance

Lecture 4. Writing parallel programs with MPI Measuring performance Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths

More information

Timing Results of a Parallel FFTsynth

Timing Results of a Parallel FFTsynth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu

More information

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication 1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

All of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and

All of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and Efficient Parallel Algorithms for Template Matching Sanguthevar Rajasekaran Department of CISE, University of Florida Abstract. The parallel complexity of template matching has been well studied. In this

More information

Performance and Scalability. Lars Karlsson

Performance and Scalability. Lars Karlsson Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

ODD EVEN SHIFTS IN SIMD HYPERCUBES 1

ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 Sanjay Ranka 2 and Sartaj Sahni 3 Abstract We develop a linear time algorithm to perform all odd (even) length circular shifts of data in an SIMD hypercube. As an application,

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture Interconnection Network Performance Performance Analysis of Interconnection Networks Bandwidth Latency Proportional to diameter Latency with contention Processor

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Review for the Midterm Exam

Review for the Midterm Exam Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations

More information

Conquering Edge Faults in a Butterfly with Automorphisms

Conquering Edge Faults in a Butterfly with Automorphisms International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-0) Conquering Edge Faults in a Butterfly with Automorphisms Meghanad D. Wagh and Khadidja Bendjilali Department

More information

Online Packet Routing on Linear Arrays and Rings

Online Packet Routing on Linear Arrays and Rings Proc. 28th ICALP, LNCS 2076, pp. 773-784, 2001 Online Packet Routing on Linear Arrays and Rings Jessen T. Havill Department of Mathematics and Computer Science Denison University Granville, OH 43023 USA

More information

Embeddings, Fault Tolerance and Communication Strategies in k-ary n-cube Interconnection Networks

Embeddings, Fault Tolerance and Communication Strategies in k-ary n-cube Interconnection Networks Embeddings, Fault Tolerance and Communication Strategies in k-ary n-cube Interconnection Networks Yaagoub A. Ashir A thesis submitted for the degree of Doctor of Philosophy Department of Mathematics &

More information

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Jean-Claude Bermond a,, Bi Li a,b, Nicolas Nisse a, Hervé Rivano c, Min-Li Yu d a Coati Project, INRIA I3S(CNRS/UNSA), Sophia

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32 Outline 1. Introduction 2. Parallel

More information

Class Field Theory. Steven Charlton. 29th February 2012

Class Field Theory. Steven Charlton. 29th February 2012 Class Theory 29th February 2012 Introduction Motivating examples Definition of a binary quadratic form Fermat and the sum of two squares The Hilbert class field form x 2 + 23y 2 Motivating Examples p =

More information

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination TRB for benign failures Early stopping: the idea Sender in round : :! send m to all Process p in round! k, # k # f+!! :! if delivered m in round k- and p " sender then 2:!! send m to all 3:!! halt 4:!

More information

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Jean-Claude Bermond a,b,, Bi Li b,a,c, Nicolas Nisse b,a, Hervé Rivano d, Min-Li Yu e a Univ. Nice Sophia Antipolis, CNRS,

More information

Parallel Performance Theory - 1

Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis

More information

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

Divisible Job Scheduling in Systems with Limited Memory. Paweł Wolniewicz

Divisible Job Scheduling in Systems with Limited Memory. Paweł Wolniewicz Divisible Job Scheduling in Systems with Limited Memory Paweł Wolniewicz 2003 Table of Contents Table of Contents 1 1 Introduction 3 2 Divisible Job Model Fundamentals 6 2.1 Formulation of the problem.......................

More information

Parallel Genetic Algorithms

Parallel Genetic Algorithms Parallel Genetic Algorithms Timothy H. Kaiser, Ph.D. Introduction Project Purpose What is a Genetic Algorithm? Serial Algorithm Modes of Parallelization My All to All Example Problems Results Future Direction

More information

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,

More information

Concurrent Counting is harder than Queuing

Concurrent Counting is harder than Queuing Concurrent Counting is harder than Queuing Costas Busch Rensselaer Polytechnic Intitute Srikanta Tirthapura Iowa State University 1 rbitrary graph 2 Distributed Counting count count count count Some processors

More information

Continuing discussion of CRC s, especially looking at two-bit errors

Continuing discussion of CRC s, especially looking at two-bit errors Continuing discussion of CRC s, especially looking at two-bit errors The definition of primitive binary polynomials Brute force checking for primitivity A theorem giving a better test for primitivity Fast

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to

More information

Tight Bounds on the Diameter of Gaussian Cubes

Tight Bounds on the Diameter of Gaussian Cubes Tight Bounds on the Diameter of Gaussian Cubes DING-MING KWAI AND BEHROOZ PARHAMI Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106 9560, USA Email: parhami@ece.ucsb.edu

More information

Gray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ

Gray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ Gray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ Myung M. Bae Scalable POWERparallel Systems, MS/P963, IBM Corp. Poughkeepsie, NY 6 myungbae@us.ibm.com Bella Bose Dept. of Computer Science Oregon

More information

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego CSE 4 Lecture Standard Combinational Modules CK Cheng and Diba Mirza CSE Dept. UC San Diego Part III - Standard Combinational Modules (Harris: 2.8, 5) Signal Transport Decoder: Decode address Encoder:

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems

Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems R. BALDONI ET AL. 1 Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems Supplemental material Roberto Baldoni, Silvia Bonomi, Marco Platania, and Leonardo Querzoni 1 ALGORITHM PSEUDO-CODE

More information

3/11/18. Final Code Generation and Code Optimization

3/11/18. Final Code Generation and Code Optimization Final Code Generation and Code Optimization 1 2 3 for ( i=0; i < N; i++) { base = &a[0]; crt = *(base + i); } original code base = &a[0]; for ( i=0; i < N; i++) { crt = *(base + i); } optimized code e1

More information

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.

More information

Consensus when failstop doesn't hold

Consensus when failstop doesn't hold Consensus when failstop doesn't hold FLP shows that can't solve consensus in an asynchronous system with no other facility. It can be solved with a perfect failure detector. If p suspects q then q has

More information

3D Parallel Elastodynamic Modeling of Large Subduction Earthquakes

3D Parallel Elastodynamic Modeling of Large Subduction Earthquakes D Parallel Elastodynamic Modeling of Large Subduction Earthquakes Eduardo Cabrera 1 Mario Chavez 2 Raúl Madariaga Narciso Perea 2 and Marco Frisenda 1 Supercomputing Dept. DGSCA UNAM C.U. 04510 Mexico

More information

Formal Verification of Mobile Network Protocols

Formal Verification of Mobile Network Protocols Dipartimento di Informatica, Università di Pisa, Italy milazzo@di.unipi.it Pisa April 26, 2005 Introduction Modelling Systems Specifications Examples Algorithms Introduction Design validation ensuring

More information

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The

More information

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE

More information

Parallel Program Performance Analysis

Parallel Program Performance Analysis Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon

More information

Closed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions

Closed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions Closed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions Saâd Biaz Λ Jennifer L. Welch y Key words: Distributed Computing Clock Synchronization Optimal Precision Generalized Hypercube

More information

Andrew Morton University of Waterloo Canada

Andrew Morton University of Waterloo Canada EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

CS 170 Algorithms Fall 2014 David Wagner MT2

CS 170 Algorithms Fall 2014 David Wagner MT2 CS 170 Algorithms Fall 2014 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) Your Student ID number: Your Unix account login: cs170- The room you are sitting in right now: Name of the

More information

Counting in Practical Anonymous Dynamic Networks is Polynomial

Counting in Practical Anonymous Dynamic Networks is Polynomial Counting in Practical Anonymous Dynamic Networks is Polynomial Maitri Chakraborty, Alessia Milani, and Miguel A. Mosteiro NETyS 2016 The Internet of Things The Counting Problem How do you count the size

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Hw 6 due Thursday, Nov 3, 5pm No lab this week EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM

More information

Chapter 7. Sequential Circuits Registers, Counters, RAM

Chapter 7. Sequential Circuits Registers, Counters, RAM Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage

More information

Parallel Performance Theory

Parallel Performance Theory AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline

More information

TIME DEPENDENCE OF SHELL MODEL CALCULATIONS 1. INTRODUCTION

TIME DEPENDENCE OF SHELL MODEL CALCULATIONS 1. INTRODUCTION Mathematical and Computational Applications, Vol. 11, No. 1, pp. 41-49, 2006. Association for Scientific Research TIME DEPENDENCE OF SHELL MODEL CALCULATIONS Süleyman Demirel University, Isparta, Turkey,

More information

Searching for Black Holes in Subways

Searching for Black Holes in Subways Searching for Black Holes in Subways Paola Flocchini Matthew Kellett Peter C. Mason Nicola Santoro Abstract Current mobile agent algorithms for mapping faults in computer networks assume that the network

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

Image Reconstruction And Poisson s equation

Image Reconstruction And Poisson s equation Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have

More information

Unreliable Failure Detectors for Reliable Distributed Systems

Unreliable Failure Detectors for Reliable Distributed Systems Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms

More information

2. One-To-All Broadcast and All-To-One Reduction. 1. Chapter 4 : Efficient Collective Communication

2. One-To-All Broadcast and All-To-One Reduction. 1. Chapter 4 : Efficient Collective Communication 1. Chater : Efficient Collective Communication Collective communication: comm amongst collection of nodes (not just sender & recver. One-to-all (bcast, all-to-one (reduc, all-to-all, scatter/gather, etc.

More information

Network Congestion Measurement and Control

Network Congestion Measurement and Control Network Congestion Measurement and Control Wasim El-Hajj Dionysios Kountanis Anupama Raju email: {welhajj, kountan, araju}@cs.wmich.edu Department of Computer Science Western Michigan University Kalamazoo,

More information

On The Energy Complexity of Parallel Algorithms

On The Energy Complexity of Parallel Algorithms On The Energy Complexity of Parallel Algorithms Vijay Anand Korthikanti Department of Computer Science University of Illinois at Urbana Champaign vkortho2@illinois.edu Gul Agha Department of Computer Science

More information

Combinatorial algorithms

Combinatorial algorithms Combinatorial algorithms computing subset rank and unrank, Gray codes, k-element subset rank and unrank, computing permutation rank and unrank Jiří Vyskočil, Radek Mařík 2012 Combinatorial Generation definition:

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Gossip Latin Square and The Meet-All Gossipers Problem

Gossip Latin Square and The Meet-All Gossipers Problem Gossip Latin Square and The Meet-All Gossipers Problem Nethanel Gelernter and Amir Herzberg Department of Computer Science Bar Ilan University Ramat Gan, Israel 52900 Email: firstname.lastname@gmail.com

More information

Dynamic Programming. Data Structures and Algorithms Andrei Bulatov

Dynamic Programming. Data Structures and Algorithms Andrei Bulatov Dynamic Programming Data Structures and Algorithms Andrei Bulatov Algorithms Dynamic Programming 18-2 Weighted Interval Scheduling Weighted interval scheduling problem. Instance A set of n jobs. Job j

More information

VNS for the TSP and its variants

VNS for the TSP and its variants VNS for the TSP and its variants Nenad Mladenović, Dragan Urošević BALCOR 2011, Thessaloniki, Greece September 23, 2011 Mladenović N 1/37 Variable neighborhood search for the TSP and its variants Problem

More information

Some mathematical properties of Cayley digraphs with applications to interconnection network design

Some mathematical properties of Cayley digraphs with applications to interconnection network design International Journal of Computer Mathematics Vol. 82, No. 5, May 2005, 521 528 Some mathematical properties of Cayley digraphs with applications to interconnection network design WENJUN XIAO and BEHROOZ

More information

Multi-join Query Evaluation on Big Data Lecture 2

Multi-join Query Evaluation on Big Data Lecture 2 Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday

More information

Multicore Semantics and Programming

Multicore Semantics and Programming Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and

More information

Lecture 4: Divide and Conquer: van Emde Boas Trees

Lecture 4: Divide and Conquer: van Emde Boas Trees Lecture 4: Divide and Conquer: van Emde Boas Trees Series of Improved Data Structures Insert, Successor Delete Space This lecture is based on personal communication with Michael Bender, 001. Goal We want

More information

Construction of Vertex-Disjoint Paths in Alternating Group Networks

Construction of Vertex-Disjoint Paths in Alternating Group Networks Construction of Vertex-Disjoint Paths in Alternating Group Networks (April 22, 2009) Shuming Zhou Key Laboratory of Network Security and Cryptology, Fujian Normal University, Fuzhou, Fujian 350108, P.R.

More information

In Some Curved Spaces, We Can Solve NP-Hard Problems in Polynomial Time: Towards Matiyasevich s Dream

In Some Curved Spaces, We Can Solve NP-Hard Problems in Polynomial Time: Towards Matiyasevich s Dream In Some Curved Spaces, We Can Solve NP-Hard Problems in Polynomial Time: Towards Matiyasevich s Dream Vladik Kreinovich 1 and Maurice Margenstern 2 1 Department of Computer Science University of Texas

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1

MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1 Mathematics and Computers in Simulation 50 (1999) 247±254 MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1 Ivan Lirkov *, Svetozar Margenov Central Laboratory for Parallel

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in

More information

Parallel Algorithms. A. Legrand. Arnaud Legrand, CNRS, University of Grenoble. LIG laboratory, October 2, / 235

Parallel Algorithms. A. Legrand. Arnaud Legrand, CNRS, University of Grenoble. LIG laboratory, October 2, / 235 Parallel Arnaud Legrand, CNRS, University of Grenoble LIG laboratory, arnaud.legrand@imag.fr October 2, 2011 1 / 235 Outline Parallel Part I Models Part II Communications on a Ring Part III Speedup and

More information

On Detecting Multiple Faults in Baseline Interconnection Networks

On Detecting Multiple Faults in Baseline Interconnection Networks On Detecting Multiple Faults in Baseline Interconnection Networks SHUN-SHII LIN 1 AND SHAN-TAI CHEN 2 1 National Taiwan Normal University, Taipei, Taiwan, ROC 2 Chung Cheng Institute of Technology, Tao-Yuan,

More information

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class

INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261

More information

Coding for loss tolerant systems

Coding for loss tolerant systems Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon

More information

Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System

Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System IEEE TRASACTIOS O PARALLEL AD DISTRIBUTED SYSTEMS, VOL. 9, O. 8, AUGUST 1998 705 Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined

More information

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00 NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR Sp ' 00 May 3 OPEN BOOK exam (students are permitted to bring in textbooks, handwritten notes, lecture notes

More information

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution

More information