Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Size: px
Start display at page:

Download "Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers"

Transcription

1 Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous example 2: Heat Distribution serial and parallel code comparison of block and strip partitioning methods safety ghost points (synchronous example 3) advection - Assignment Ref: Chapter 6: Wilkinson and Allen COM4300/8300 L2: Synchronous Computations 207 COM4300/8300 L2: Synchronous Computations Barriers Counter-based or Linear Barriers barrier: a point at which all processes must wait until all other processes have reached that point rr r mutual exclusion: a barrier that prevents other processes from entering the following region if another process is already in that region common in shared memory parallel programs necessary for some MI-2 operations both are possible sources of overhead one process counts the arrival of the other processes when all processes have arrive, they are each sent a release message COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations 207 4

2 s s s s r s s s r s Implementation Butterfly Barrier (Butterfly/Omega Network) arrival phase: process sends message to central counter departure phase: process receives message from central counter r ss s s r s implementations must handle possible time delays e.g. two barriers in quick succession cost is O(p) cost is 2lg p or O(lg p) COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations Tree-Based Barriers Degrees of Synchronization from fully to loosely synchronous the more synchronous your computation, the more potential overhead SIMD: synchronized at the instruction level provides ease of programming (one program) well suited for data decomposition applicable to many numerical problems the r statement was introduced to specify data parallel operations r < { t r r note: broadcast does not ensure synchronization cost 2lg p or O(lg p) COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations 207 8

3 Synchronous Example: Jacobi Iterations Sequential Jacobi Code the Jacobi iteration solves a system of linear equations iteratively a n,0 x 0 + a n, x + a n,2 x 2 + a n,n x n = b n. a 2,0 x 0 + a 2, x + a 2,2 x 2 + a 2,n x n = b 2 a,0 x 0 + a, x + a,2 x 2 + a,n x n = b a 0,0 x 0 + a 0, x + a 0,2 x 2 + a 0,n x n = b 0 where there are n equations and n unknowns (x 0,x,x 2, x n ) ignoring convergence testing: r < r t r t r < t r t r { r < { s r < { s s s r < COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations 207 Jacobi Iterations arallel Jacobi Code consider equation i as: a i,0 x 0 + a i, x + a i,2 x 2 + a i,n x n = b i which we can re-cast as: x i = (/a i,i )[b i (a i,0 x 0 + a i, x + a i,2 x 2 + a i,i x i + a i,i+ x i+ + a i,n x n )] i.e. x i = a i,i [ bi j i a i, j x j ] strategy: guess x, then iterate and hope it converges! converges if the matrix is diagonally dominant: j i a i, j < a i,i terminate when convergence is achieved: x t x t <error tolerance ignoring convergence testing and assuming parallelisation over n processes: r t r t r < t r t r { s r < { s s s r s t t r rr r r < r st t r sends the local to all processes and collects their new values COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations 207 2

4 Broadcast Gather arallel Jacobi Iteration Time arameters: t s = 0 5 t f,t w = 50t f,n = 000 Can be (simplistically) implemented as: r < s r ss r < r r ss Question: do we really need the barrier as well as this? COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations artitioning normally the number of processes is much less than the number of data items block partitioning: allocate groups of consecutive unknowns to processes cyclic partitioning: allocate in a round-robin fashion analysis: τ iterations, n/p unknowns per process computation decreases with p communication increases with p total - has an overall minimum t comp = τ(2n + 4)(n/p)t f t comm = p(t s + (n/p)t w )τ = (pt s + nt w )τ t tot = ((2n + 4)(n/p)t f + pt s + nt w )τ question: can we do an all-gather faster than pt s + nt w? COM4300/8300 L2: Synchronous Computations Locally Synchronous Example: Heat Distribution roblem Consider a metal sheet with a fixed temperature along the sides but unknown temperatures in the middle find the temperature in the middle. finite difference approximation to the Laplace equation: 2 T (x,y) x T (x,y) y 2 = 0 T (x + δx,y) 2T (x,y) + T (x δx,y) T (x,y + δy) 2T (x,y) + T (x,y δy) δx 2 + δy 2 = 0 assuming an even grid (i.e. δx = δy) of n n points (denoted as h i, j ), the temperature at any point is an average of surrounding points: h i, j = h i, j + h i+, j + h i, j + h i, j+ 4 problem is very similar to the Game of Life, i.e. what happens in a cell depends upon its neighbours COM4300/8300 L2: Synchronous Computations 207 6

5 Array Ordering Heat Equation: arallel Code we will solve iteratively: x i = x i +x i+ +x i k +x i+k 4 but this problem may also be written as a system of linear equations: x i k + x i 4x i + x i+ + x i+k = 0 one point per process assuming non-blocking sends: r t r t r < t r t r { 2 3 s s s s r r r 2 r 3 sends and receives provide a local barrier each process synchronizes with 4 others surrounding processes COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations Heat Equation: Sequential Code Heat Equation: artitioning normally more than one point per process assume a fixed number of iterations and a square mesh option of either block or strip partitioning beware of what happens at the edges! r t r t r < t r t r { r < r < r < r < COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations

6 Block/Strip Communication Comparison block partitioning: four edges exchanged (n 2 data points, p processes) t comm = 8(t s + n p t w ) Safety and Deadlock with all processes sending and then receiving data, the code is unsafe: it relies on local buffering in the s function strip partitioning: two edges exchanged t comm = 4(t s + nt w ) potential for deadlock (as in rac, Ex 3)! alternative #: re-order sends and receives e.g. for strip partitioning: 2 { s r s r s { r s r s COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations Block/Strip Optimum ( block communication is larger than strip if: 8 t s + n p t w )>4(t w + nt w ) ( i.e. if t s > n 2 p )t w t s Alt# 2: Asynchronous Communication using Ghostpoints assign extra receive buffers for edges where data is exchanged typically these are implemented as extra rows and columns in each process local array (known as a halo) can use asynchronous calls (e.g. s ) t t t t t t COM4300/8300 L2: Synchronous Computations COM4300/8300 L2: Synchronous Computations

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Overview: Parallelisation via Pipelining

Overview: Parallelisation via Pipelining Overview: Parallelisation via Pipelining three type of pipelines adding numbers (type ) performance analysis of pipelines insertion sort (type ) linear system back substitution (type ) Ref: chapter : Wilkinson

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Image Reconstruction And Poisson s equation

Image Reconstruction And Poisson s equation Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have

More information

Finite difference methods. Finite difference methods p. 1

Finite difference methods. Finite difference methods p. 1 Finite difference methods Finite difference methods p. 1 Overview 1D heat equation u t = κu xx +f(x,t) as a motivating example Quick intro of the finite difference method Recapitulation of parallelization

More information

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization

More information

= ( 1 P + S V P) 1. Speedup T S /T V p

= ( 1 P + S V P) 1. Speedup T S /T V p Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation

More information

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes ) CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Clocks in Asynchronous Systems

Clocks in Asynchronous Systems Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy

More information

Last class: Today: Synchronization. Deadlocks

Last class: Today: Synchronization. Deadlocks Last class: Synchronization Today: Deadlocks Definition A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause. An event could

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Parallel Programming. Parallel algorithms Linear systems solvers

Parallel Programming. Parallel algorithms Linear systems solvers Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

Overlay networks maximizing throughput

Overlay networks maximizing throughput Overlay networks maximizing throughput Olivier Beaumont, Lionel Eyraud-Dubois, Shailesh Kumar Agrawal Cepage team, LaBRI, Bordeaux, France IPDPS April 20, 2010 Outline Introduction 1 Introduction 2 Complexity

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 13 Finite Difference Methods Outline n Ordinary and partial differential equations n Finite difference methods n Vibrating string

More information

CSC501 Operating Systems Principles. Deadlock

CSC501 Operating Systems Principles. Deadlock CSC501 Operating Systems Principles Deadlock 1 Last Lecture q Priority Inversion Q Priority Inheritance Protocol q Today Q Deadlock 2 The Deadlock Problem q Definition Q A set of blocked processes each

More information

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen

Algorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning

More information

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg

Antonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

Domain decomposition on different levels of the Jacobi-Davidson method

Domain decomposition on different levels of the Jacobi-Davidson method hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional

More information

The Weakest Failure Detector to Solve Mutual Exclusion

The Weakest Failure Detector to Solve Mutual Exclusion The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008

More information

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity

More information

Performance Evaluation of Codes. Performance Metrics

Performance Evaluation of Codes. Performance Metrics CS6230 Performance Evaluation of Codes Performance Metrics Aim to understanding the algorithmic issues in obtaining high performance from large scale parallel computers Topics for Conisderation General

More information

Counters. We ll look at different kinds of counters and discuss how to build them

Counters. We ll look at different kinds of counters and discuss how to build them Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing

More information

Lecture 15 April 9, 2007

Lecture 15 April 9, 2007 6.851: Advanced Data Structures Spring 2007 Mihai Pătraşcu Lecture 15 April 9, 2007 Scribe: Ivaylo Riskov 1 Overview In the last lecture we considered the problem of finding the predecessor in its static

More information

INF 4140: Models of Concurrency Series 3

INF 4140: Models of Concurrency Series 3 Universitetet i Oslo Institutt for Informatikk PMA Olaf Owe, Martin Steffen, Toktam Ramezani INF 4140: Models of Concurrency Høst 2016 Series 3 14. 9. 2016 Topic: Semaphores (Exercises with hints for solution)

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

CPU scheduling. CPU Scheduling

CPU scheduling. CPU Scheduling EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming

More information

Process Scheduling. Process Scheduling. CPU and I/O Bursts. CPU - I/O Burst Cycle. Variations in Bursts. Histogram of CPU Burst Times

Process Scheduling. Process Scheduling. CPU and I/O Bursts. CPU - I/O Burst Cycle. Variations in Bursts. Histogram of CPU Burst Times Scheduling The objective of multiprogramming is to have some process running all the time The objective of timesharing is to have the switch between processes so frequently that users can interact with

More information

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni

HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may

More information

Timing Results of a Parallel FFTsynth

Timing Results of a Parallel FFTsynth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu

More information

Sequential Circuits Sequential circuits combinational circuits state gate delay

Sequential Circuits Sequential circuits combinational circuits state gate delay Sequential Circuits Sequential circuits are those with memory, also called feedback. In this, they differ from combinational circuits, which have no memory. The stable output of a combinational circuit

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

Automation in Complex Systems MIE090

Automation in Complex Systems MIE090 Automation in Complex Systems MIE090 Exam Monday May 29, 2017 You may bring the course book and the reprints (defined in the course requirements), but not the solution to problems or your own solutions

More information

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns 5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns (1) possesses the solution and provided that.. The numerators and denominators are recognized

More information

Time. Today. l Physical clocks l Logical clocks

Time. Today. l Physical clocks l Logical clocks Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s

More information

A simple FEM solver and its data parallelism

A simple FEM solver and its data parallelism A simple FEM solver and its data parallelism Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 Partial differential equation Considered Problem

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

Convergence Models and Surprising Results for the Asynchronous Jacobi Method

Convergence Models and Surprising Results for the Asynchronous Jacobi Method Convergence Models and Surprising Results for the Asynchronous Jacobi Method Jordi Wolfson-Pou School of Computational Science and Engineering Georgia Institute of Technology Atlanta, Georgia, United States

More information

Counting in Practical Anonymous Dynamic Networks is Polynomial

Counting in Practical Anonymous Dynamic Networks is Polynomial Counting in Practical Anonymous Dynamic Networks is Polynomial Maitri Chakraborty, Alessia Milani, and Miguel A. Mosteiro NETyS 2016 The Internet of Things The Counting Problem How do you count the size

More information

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:

More information

Gossip algorithms for solving Laplacian systems

Gossip algorithms for solving Laplacian systems Gossip algorithms for solving Laplacian systems Anastasios Zouzias University of Toronto joint work with Nikolaos Freris (EPFL) Based on : 1.Fast Distributed Smoothing for Clock Synchronization (CDC 1).Randomized

More information

INF Models of concurrency

INF Models of concurrency INF4140 - Models of concurrency RPC and Rendezvous INF4140 Lecture 15. Nov. 2017 RPC and Rendezvous Outline More on asynchronous message passing interacting processes with different patterns of communication

More information

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links

Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 3. Θ Notation. Comparing Algorithms

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 3. Θ Notation. Comparing Algorithms Taking Stock IE170: Algorithms in Systems Engineering: Lecture 3 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University January 19, 2007 Last Time Lots of funky math Playing

More information

Single-part-type, multiple stage systems. Lecturer: Stanley B. Gershwin

Single-part-type, multiple stage systems. Lecturer: Stanley B. Gershwin Single-part-type, multiple stage systems Lecturer: Stanley B. Gershwin Flow Line... also known as a Production or Transfer Line. M 1 B 1 M 2 B 2 M 3 B 3 M 4 B 4 M 5 B 5 M 6 Machine Buffer Machines are

More information

CPU Scheduling Exercises

CPU Scheduling Exercises CPU Scheduling Exercises NOTE: All time in these exercises are in msec. Processes P 1, P 2, P 3 arrive at the same time, but enter the job queue in the order presented in the table. Time quantum = 3 msec

More information

1 Lamport s Bakery Algorithm

1 Lamport s Bakery Algorithm Com S 6 Spring Semester 2009 Algorithms for Multiprocessor Synchronization Lecture 3: Tuesday, 27th January 2009 Instructor: Soma Chaudhuri Scribe: Neeraj Khanolkar Lamport s Bakery Algorithm Algorithm

More information

Real Time Operating Systems

Real Time Operating Systems Real Time Operating ystems Luca Abeni luca.abeni@unitn.it Interacting Tasks Until now, only independent tasks... A job never blocks or suspends A task only blocks on job termination In real world, jobs

More information

An Indian Journal FULL PAPER. Trade Science Inc. A real-time causal order delivery approach in largescale ABSTRACT KEYWORDS

An Indian Journal FULL PAPER. Trade Science Inc. A real-time causal order delivery approach in largescale ABSTRACT KEYWORDS [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 18 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(18), 2014 [10717-10723] A real-time causal order delivery approach in largescale

More information

1 / 28. Parallel Programming.

1 / 28. Parallel Programming. 1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:

More information

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

Trivially parallel computing

Trivially parallel computing Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our

More information

Solving the Convection Diffusion Equation on Distributed Systems

Solving the Convection Diffusion Equation on Distributed Systems Solving the Convection Diffusion Equation on Distributed Systems N. Missirlis, F. Tzaferis, G. Karagiorgos, A. Theodorakos, A. Kontarinis, A. Konsta Department of Informatics and Telecommunications University

More information

Chapter 5. Methods for Solving Elliptic Equations

Chapter 5. Methods for Solving Elliptic Equations Chapter 5. Methods for Solving Elliptic Equations References: Tannehill et al Section 4.3. Fulton et al (1986 MWR). Recommended reading: Chapter 7, Numerical Methods for Engineering Application. J. H.

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

How to Perform a Time Study

How to Perform a Time Study How to Perform a Time Study Since the emergence of Taylorism in the 1880 s (a system of scientific management developed by Frederick W. Taylor), industrial production has been broken down into its most

More information

A Brief Introduction to Model Checking

A Brief Introduction to Model Checking A Brief Introduction to Model Checking Jan. 18, LIX Page 1 Model Checking A technique for verifying finite state concurrent systems; a benefit on this restriction: largely automatic; a problem to fight:

More information

DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD

DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD D DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD Yashasvini Sharma 1 Abstract The process scheduling, is one of the most important tasks

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Distributed Computing. Synchronization. Dr. Yingwu Zhu

Distributed Computing. Synchronization. Dr. Yingwu Zhu Distributed Computing Synchronization Dr. Yingwu Zhu Topics to Discuss Physical Clocks Logical Clocks: Lamport Clocks Classic paper: Time, Clocks, and the Ordering of Events in a Distributed System Lamport

More information

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks Sanjeeb Nanda and Narsingh Deo School of Computer Science University of Central Florida Orlando, Florida 32816-2362 sanjeeb@earthlink.net,

More information

Department of Electrical & Electronics EE-333 DIGITAL SYSTEMS

Department of Electrical & Electronics EE-333 DIGITAL SYSTEMS Department of Electrical & Electronics EE-333 DIGITAL SYSTEMS 1) Given the two binary numbers X = 1010100 and Y = 1000011, perform the subtraction (a) X -Y and (b) Y - X using 2's complements. a) X = 1010100

More information

Computer Architecture 10. Fast Adders

Computer Architecture 10. Fast Adders Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance

More information

Zacros. Software Package Development: Pushing the Frontiers of Kinetic Monte Carlo Simulation in Catalysis

Zacros. Software Package Development: Pushing the Frontiers of Kinetic Monte Carlo Simulation in Catalysis Zacros Software Package Development: Pushing the Frontiers of Kinetic Monte Carlo Simulation in Catalysis Jens H Nielsen, Mayeul D'Avezac, James Hetherington & Michail Stamatakis Introduction to Zacros

More information

MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF 2D AND 3D ISING MODEL

MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF 2D AND 3D ISING MODEL Journal of Optoelectronics and Advanced Materials Vol. 5, No. 4, December 003, p. 971-976 MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF D AND 3D ISING MODEL M. Diaconu *, R. Puscasu, A. Stancu

More information

The Design Procedure. Output Equation Determination - Derive output equations from the state table

The Design Procedure. Output Equation Determination - Derive output equations from the state table The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types

More information

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG Why Synchronization? You want to catch a bus at 6.05 pm, but your watch is

More information

Searching. Sorting. Lambdas

Searching. Sorting. Lambdas .. s Babes-Bolyai University arthur@cs.ubbcluj.ro Overview 1 2 3 Feedback for the course You can write feedback at academicinfo.ubbcluj.ro It is both important as well as anonymous Write both what you

More information

COL 730: Parallel Programming

COL 730: Parallel Programming COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some

More information

Module 5: CPU Scheduling

Module 5: CPU Scheduling Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained

More information

EET 310 Flip-Flops 11/17/2011 1

EET 310 Flip-Flops 11/17/2011 1 EET 310 Flip-Flops 11/17/2011 1 FF s and some Definitions Clock Input: FF s are controlled by a trigger or Clock signal. All FF s have a clock input. If a device which attempts to do a FF s task does not

More information

Parallel Program Performance Analysis

Parallel Program Performance Analysis Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon

More information

Theory of Parallel Hardware May 11, 2004 Massachusetts Institute of Technology Charles Leiserson, Michael Bender, Bradley Kuszmaul

Theory of Parallel Hardware May 11, 2004 Massachusetts Institute of Technology Charles Leiserson, Michael Bender, Bradley Kuszmaul Theory of Parallel Hardware May 11, 2004 Massachusetts Institute of Technology 6.896 Charles Leiserson, Michael Bender, Bradley Kuszmaul Final Examination Final Examination ffl Do not oen this exam booklet

More information

Iterative Solvers. Lab 6. Iterative Methods

Iterative Solvers. Lab 6. Iterative Methods Lab 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

Optimal Resilience Asynchronous Approximate Agreement

Optimal Resilience Asynchronous Approximate Agreement Optimal Resilience Asynchronous Approximate Agreement Ittai Abraham, Yonatan Amit, and Danny Dolev School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel {ittaia, mitmit,

More information

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science Divide & Conquer Jordi Cortadella and Jordi Petit Department of Computer Science Divide-and-conquer algorithms Strategy: Divide the problem into smaller subproblems of the same type of problem Solve the

More information

Operating Systems. VII. Synchronization

Operating Systems. VII. Synchronization Operating Systems VII. Synchronization Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Synchronization issues 2/22 Fall 2017 Institut

More information

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained

More information

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences

Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Jean-Claude Bermond a,, Bi Li a,b, Nicolas Nisse a, Hervé Rivano c, Min-Li Yu d a Coati Project, INRIA I3S(CNRS/UNSA), Sophia

More information

CSI Mathematical Induction. Many statements assert that a property of the form P(n) is true for all integers n.

CSI Mathematical Induction. Many statements assert that a property of the form P(n) is true for all integers n. CSI 2101- Mathematical Induction Many statements assert that a property of the form P(n) is true for all integers n. Examples: For every positive integer n: n! n n Every set with n elements, has 2 n Subsets.

More information

Termination Problem of the APO Algorithm

Termination Problem of the APO Algorithm Termination Problem of the APO Algorithm Tal Grinshpoun, Moshe Zazon, Maxim Binshtok, and Amnon Meisels Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel Abstract. Asynchronous

More information

The Sieve of Erastothenes

The Sieve of Erastothenes The Sieve of Erastothenes Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 25, 2010 José Monteiro (DEI / IST) Parallel and Distributed

More information

Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions

Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions Mitra Nasri Chair of Real-time Systems, Technische Universität Kaiserslautern, Germany nasri@eit.uni-kl.de

More information

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes

More information