Overview: Synchronous Computations
|
|
- Leslie Daniel
- 5 years ago
- Views:
Transcription
1 Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous example 2: Heat Distribution serial and parallel code comparison of block and strip partitioning methods safety ghost points (synchronous example 3) advection - Assignment 1 Ref: Chapter 6: Wilkinson and Allen COM4300/8300 L12: Synchronous Computations
2 Barriers barrier: a point at which all processes must wait until all other processes have reached that point MI Barrier(MI Comm comm); mutual exclusion: a barrier that prevents other processes from entering the following region if another process is already in that region common in shared memory parallel programs necessary for some MI-2 operations both are possible sources of overhead COM4300/8300 L12: Synchronous Computations
3 Barrier rocesses n 1 Active Time Waiting Barrier COM4300/8300 L12: Synchronous Computations
4 Counter-based or Linear Barriers rocesses 0 1 n 1 Counter, C Increment and check for n Barrier(); Barrier(); Barrier(); one process counts the arrival of the other processes when all processes have arrive, they are each sent a release message COM4300/8300 L12: Synchronous Computations
5 Implementation arrival phase: process sends message to central counter departure phase: process receives message from central counter Master Slave rocesses Arrival phase Departure phase for (i=0; i<p 1; i++) recv((any)); for (i=0; i<p 1; i++) send((i)); Barrier: send((master)); recv((master)); Barrier: send((master)); recv((master)); implementations must handle possible time delays e.g. two barriers in quick succession cost is O(p) COM4300/8300 L12: Synchronous Computations
6 Tree-Based Barriers Arrival at barrier Departure from barrier note: broadcast does not ensure synchronization cost 2lg p or O(lg p) COM4300/8300 L12: Synchronous Computations
7 Butterfly Barrier (Butterfly/Omega Network) st stage 2nd stage Time 3rd stage cost is 2lg p or O(lg p) COM4300/8300 L12: Synchronous Computations
8 Degrees of Synchronization from fully to loosely synchronous the more synchronous your computation, the more potential overhead SIMD: synchronized at the instruction level provides ease of programming (one program) well suited for data decomposition applicable to many numerical problems the forall statement was introduced to specify data parallel operations forall ( i = 0; i < n ; i ++) { data parallel work } COM4300/8300 L12: Synchronous Computations
9 Synchronous Example: Jacobi Iterations the Jacobi iteration solves a system of linear equations iteratively a n 1,0 x 0 + a n 1,1 x 1 + a n 1,2 x 2 + a n 1,n 1 x n 1 = b n 1. a 2,0 x 0 + a 2,1 x 1 + a 2,2 x 2 + a 2,n 1 x n 1 = b 2 a 1,0 x 0 + a 1,1 x 1 + a 1,2 x 2 + a 1,n 1 x n 1 = b 1 a 0,0 x 0 + a 0,1 x 1 + a 0,2 x 2 + a 0,n 1 x n 1 = b 0 where there are n equations and n unknowns (x 0,x 1,x 2, x n 1 ) COM4300/8300 L12: Synchronous Computations
10 Jacobi Iterations consider equation i as: a i,0 x 0 + a i,1 x 1 + a i,2 x 2 + a i,n 1 x n 1 = b i which we can re-cast as: x i = (1/a i,i )[b i (a i,0 x 0 + a i,1 x 1 + a i,2 x 2 + a i,i 1 x i 1 + a i,i+1 x i+1 + a i,n 1 x n 1 )] i.e. x i = 1 a i,i [ bi j i a i, j x j ] strategy: guess x, then iterate and hope it converges! converges if the matrix is diagonally dominant: j i a i, j < a i,i terminate when convergence is achieved: x t x t 1 < error tolerance COM4300/8300 L12: Synchronous Computations
11 Sequential Jacobi Code ignoring convergence testing: for ( i = 0; i < n ; i ++) x [ i ] = b [ i ]; for ( iter = 0; iter < m a x i t e r ; iter ++) { for ( i = 0; i < n ; i ++) { sum = a [ i ][ i ] x [ i ]; for ( j = 0; j < n ; j ++){ sum = sum + a [ i ][ j ] x [ j ] } n e w x [ i ] = ( b [ i ] sum ) / a [ i ][ i ]; } for ( i = 0; i < n ; i ++) x [ i ] = n e w x [ i ]; } COM4300/8300 L12: Synchronous Computations
12 arallel Jacobi Code ignoring convergence testing and assuming parallelisation over n processes: x [ i ] = b [ i ]; for ( iter = 0; iter < m a x i t e r ; iter ++) { sum = a [ i ][ i ] x [ i ]; for ( j = 0; j < n ; j ++){ sum = sum + a [ i ][ j ] x [ j ] } n e w x [ i ] = ( b [ i ] sum ) / a [ i ][ i ]; b r o a d c a s t g a t h e r (& n e w x [ i ], n e w x ); g l o b a l b a r r i e r (); for ( i = 0; i < n ; i ++) x [ i ] = n e w x [ i ]; } broadcast gather() sends the local new x[i] to all processes and collects their new values COM4300/8300 L12: Synchronous Computations
13 Broadcast Gather rocess 0 rocess 1 rocess n 1 Send buffer Data x Data x Data x 0 1 n 1 Receive buffer Broadcast gather() Broadcast gather() Broadcast gather() Can be (simplistically) implemented as: for ( j = 0; j < n ; j ++) send (& n e w x [ i ], / process / j ); for ( j = 0; j < n ; j ++) recv (& n e w x [ j ], / process / j ); Question: do we really need the barrier as well as this? COM4300/8300 L12: Synchronous Computations
14 artitioning normally the number of processes is much less than the number of data items block partitioning: allocate groups of consecutive unknowns to processes cyclic partitioning: allocate in a round-robin fashion analysis: τ iterations, n/p unknowns per process computation decreases with p t comp = τ(2n + 4)(n/p)t f communication increases with p t comm = p(t s + (n/p)t w )τ = (pt s + nt w )τ total - has an overall minimum t tot = ((2n + 4)(n/p)t f + pt s + nt w )τ question: can we do an all-gather faster than pt s + nt w? COM4300/8300 L12: Synchronous Computations
15 arallel Jacobi Iteration Time arameters: t s = 10 5 t f,t w = 50t f,n = 1000 Execution time Overall Communication Computation Number of processors, p COM4300/8300 L12: Synchronous Computations
16 Locally Synchronous Example: Heat Distribution roblem Consider a metal sheet with a fixed temperature along the sides but unknown temperatures in the middle find the temperature in the middle. finite difference approximation to the Laplace equation: 2 T (x,y) x T (x,y) y 2 = 0 T (x + δx,y) 2T (x,y) + T (x δx,y) δx 2 + T (x,y + δy) 2T (x,y) + T (x,y δy) δy 2 = 0 assuming an even grid (i.e. δx = δy) of n n points (denoted as h i, j ), the temperature at any point is an average of surrounding points: h i, j = h i 1, j + h i+1, j + h i, j 1 + h i, j+1 4 problem is very similar to the Game of Life, i.e. what happens in a cell depends upon its neighbours COM4300/8300 L12: Synchronous Computations
17 Array Ordering x1 x2 xk 1 xk xk+1 xk+2 x2k 1 x 2k x i k xi 1 x i xi+1 x i+k x k 2 we will solve iteratively: x i = x i 1+x i+1 +x i k +x i+k 4 but this problem may also be written as a system of linear equations: x i k + x i 1 4x i + x i+1 + x i+k = 0 COM4300/8300 L12: Synchronous Computations
18 Heat Equation: Sequential Code assume a fixed number of iterations and a square mesh beware of what happens at the edges! for ( iter = 0; iter < m a x i t e r ; iter ++) { for ( i = 1; i < n ; i ++) for ( j = 1; j < n ; j ++) g [ i ][ j ] = 0.25 ( h [i 1][j] + h [ i +1][ j ] + h [ i ][ j 1] + h [ i ][ j +1]); for ( i = 1; i < n ; i ++) for ( j = 1; j < n ; j ++) h [ i ][ j ] = g [ i ][ j ]; } COM4300/8300 L12: Synchronous Computations
19 Heat Equation: arallel Code one point per process assuming non-blocking sends: for ( iter = 0; iter < m a x i t e r ; iter ++) { g = 0.25 ( w + x + y + z ); send (& g, (i 1,j )); } send (& g, ( i +1, j )); send (& g, (i,j 1)); send (& g, (i, j +1)); recv (& w, (i 1,j )); recv (& x, ( i +1, j )); recv (& y, (i,j 1)); recv (& z, (i, j +1)); sends and receives provide a local barrier each process synchronizes with 4 others surrounding processes COM4300/8300 L12: Synchronous Computations
20 Heat Equation: artitioning normally more than one point per process option of either block or strip partitioning 0 1 p p 1 Block artitioning Strip artitioning COM4300/8300 L12: Synchronous Computations
21 Block/Strip Communication Comparison block partitioning: four edges exchanged (n 2 data points, p processes) t comm = 8(t s + n p t w ) strip partitioning: two edges exchanged t comm = 4(t s + nt w ) n p n Block Communications Strip Communications COM4300/8300 L12: Synchronous Computations
22 Block/Strip Optimum ( ) block communication is larger than strip if: 8 t s + n p t w > 4(t w + nt w ) ( i.e. if t s > n 1 2 p )t w t s Strip partition best Block partion best rocesses, p COM4300/8300 L12: Synchronous Computations
23 Safety and Deadlock with all processes sending and then receiving data, the code is unsafe: it relies on local buffering in the send() function potential for deadlock (as in rac 1, Ex 3)! alternative #1: re-order sends and receives e.g. for strip partitioning: if (( myid % 2) == 0){ send (& g [1][1], n, (i 1)); recv (& h [1][0], n, (i 1)); send (& g [1][ n ], n, ( i +1)); recv (& h [1][ n +1], n, ( i +1)); } else { recv (& h [1][0], n, p (i 1)); send (& g [1][1], n, p (i 1)); recv (& h [1][ n +1], n, p ( i +1)); send (& g [1][ n ], n, p ( i +1)); } COM4300/8300 L12: Synchronous Computations
24 Alt# 2: Asynchronous Communication using Ghostpoints assign extra receive buffers for edges where data is exchanged typically these are implemented as extra rows and columns in each process local array (known as a halo) can use asynchronous calls (e.g. MI Isend()) rocess i Ghost points Copy data rocess i+1 COM4300/8300 L12: Synchronous Computations
Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers
Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationOverview: Parallelisation via Pipelining
Overview: Parallelisation via Pipelining three type of pipelines adding numbers (type ) performance analysis of pipelines insertion sort (type ) linear system back substitution (type ) Ref: chapter : Wilkinson
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationSolution of Linear Systems
Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline
More informationFinite difference methods. Finite difference methods p. 1
Finite difference methods Finite difference methods p. 1 Overview 1D heat equation u t = κu xx +f(x,t) as a motivating example Quick intro of the finite difference method Recapitulation of parallelization
More information= ( 1 P + S V P) 1. Speedup T S /T V p
Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation
More informationImage Reconstruction And Poisson s equation
Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have
More informationParallel Scientific Computing
IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.
More informationCSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )
CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationCOMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions
COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 13 Finite Difference Methods Outline n Ordinary and partial differential equations n Finite difference methods n Vibrating string
More informationAntonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg
INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The
More informationFault-Tolerant Consensus
Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process
More informationCounters. We ll look at different kinds of counters and discuss how to build them
Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing
More informationTime. To do. q Physical clocks q Logical clocks
Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in
More informationDeadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]
Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationClocks in Asynchronous Systems
Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy
More informationAgreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationOur Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering
Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:
More informationThe Weakest Failure Detector to Solve Mutual Exclusion
The Weakest Failure Detector to Solve Mutual Exclusion Vibhor Bhatt Nicholas Christman Prasad Jayanti Dartmouth College, Hanover, NH Dartmouth Computer Science Technical Report TR2008-618 April 17, 2008
More informationPipelined Computations
Chapter 5 Slide 155 Pipelined Computations Pipelined Computations Slide 156 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each
More informationValency Arguments CHAPTER7
CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)
More informationLast class: Today: Synchronization. Deadlocks
Last class: Synchronization Today: Deadlocks Definition A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause. An event could
More information1 Lamport s Bakery Algorithm
Com S 6 Spring Semester 2009 Algorithms for Multiprocessor Synchronization Lecture 3: Tuesday, 27th January 2009 Instructor: Soma Chaudhuri Scribe: Neeraj Khanolkar Lamport s Bakery Algorithm Algorithm
More informationClock Synchronization
Today: Canonical Problems in Distributed Systems Time ordering and clock synchronization Leader election Mutual exclusion Distributed transactions Deadlock detection Lecture 11, page 7 Clock Synchronization
More informationCounting in Practical Anonymous Dynamic Networks is Polynomial
Counting in Practical Anonymous Dynamic Networks is Polynomial Maitri Chakraborty, Alessia Milani, and Miguel A. Mosteiro NETyS 2016 The Internet of Things The Counting Problem How do you count the size
More informationCSC501 Operating Systems Principles. Deadlock
CSC501 Operating Systems Principles Deadlock 1 Last Lecture q Priority Inversion Q Priority Inheritance Protocol q Today Q Deadlock 2 The Deadlock Problem q Definition Q A set of blocked processes each
More informationSection 6 Fault-Tolerant Consensus
Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.
More informationClojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014
Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,
More informationTime. Today. l Physical clocks l Logical clocks
Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s
More informationLogical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation
Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:
More informationOperating Systems. VII. Synchronization
Operating Systems VII. Synchronization Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Synchronization issues 2/22 Fall 2017 Institut
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationTiming Results of a Parallel FFTsynth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu
More informationINF 4140: Models of Concurrency Series 3
Universitetet i Oslo Institutt for Informatikk PMA Olaf Owe, Martin Steffen, Toktam Ramezani INF 4140: Models of Concurrency Høst 2016 Series 3 14. 9. 2016 Topic: Semaphores (Exercises with hints for solution)
More informationParallel Programming. Parallel algorithms Linear systems solvers
Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric
More informationPerformance Analysis of Lattice QCD Application with APGAS Programming Model
Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models
More informationAn Integrative Model for Parallelism
An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale
More informationOverlay networks maximizing throughput
Overlay networks maximizing throughput Olivier Beaumont, Lionel Eyraud-Dubois, Shailesh Kumar Agrawal Cepage team, LaBRI, Bordeaux, France IPDPS April 20, 2010 Outline Introduction 1 Introduction 2 Complexity
More informationCS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Oct. 5, 2017 Lecture 12: Time and Ordering All slides IG Why Synchronization? You want to catch a bus at 6.05 pm, but your watch is
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More information5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns
5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns (1) possesses the solution and provided that.. The numerators and denominators are recognized
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationPerformance Evaluation of Codes. Performance Metrics
CS6230 Performance Evaluation of Codes Performance Metrics Aim to understanding the algorithmic issues in obtaining high performance from large scale parallel computers Topics for Conisderation General
More informationDistributed Systems Principles and Paradigms. Chapter 06: Synchronization
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter
More informationAgreement. Today. l Coordination and agreement in group communication. l Consensus
Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process
More informationChapter 11 Time and Global States
CSD511 Distributed Systems 分散式系統 Chapter 11 Time and Global States 吳俊興 國立高雄大學資訊工程學系 Chapter 11 Time and Global States 11.1 Introduction 11.2 Clocks, events and process states 11.3 Synchronizing physical
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationDistributed Systems Principles and Paradigms
Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)
More informationDistributed Computing. Synchronization. Dr. Yingwu Zhu
Distributed Computing Synchronization Dr. Yingwu Zhu Topics to Discuss Physical Clocks Logical Clocks: Lamport Clocks Classic paper: Time, Clocks, and the Ordering of Events in a Distributed System Lamport
More informationImplementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links
Implementing Uniform Reliable Broadcast with Binary Consensus in Systems with Fair-Lossy Links Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com
More informationREDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS
REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of the Ohio State University By
More informationGroupe de travail. Analysis of Mobile Systems by Abstract Interpretation
Groupe de travail Analysis of Mobile Systems by Abstract Interpretation Jérôme Feret École Normale Supérieure http://www.di.ens.fr/ feret 31/03/2005 Introduction I We propose a unifying framework to design
More informationSolving the Convection Diffusion Equation on Distributed Systems
Solving the Convection Diffusion Equation on Distributed Systems N. Missirlis, F. Tzaferis, G. Karagiorgos, A. Theodorakos, A. Kontarinis, A. Konsta Department of Informatics and Telecommunications University
More informationCOURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.
COURSE 9 4 Numerical methods for solving linear systems Practical solving of many problems eventually leads to solving linear systems Classification of the methods: - direct methods - with low number of
More informationDETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD
D DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD Yashasvini Sharma 1 Abstract The process scheduling, is one of the most important tasks
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationConvergence Models and Surprising Results for the Asynchronous Jacobi Method
Convergence Models and Surprising Results for the Asynchronous Jacobi Method Jordi Wolfson-Pou School of Computational Science and Engineering Georgia Institute of Technology Atlanta, Georgia, United States
More informationUtility Maximizing Routing to Data Centers
0-0 Utility Maximizing Routing to Data Centers M. Sarwat, J. Shin and S. Kapoor (Presented by J. Shin) Sep 26, 2011 Sep 26, 2011 1 Outline 1. Problem Definition - Data Center Allocation 2. How to construct
More informationDistributed Memory Parallelization in NGSolve
Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (
More informationParallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa
Parallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program s Workshop on Parallel & Cluster
More information4th year Project demo presentation
4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The
More informationThe Sieve of Erastothenes
The Sieve of Erastothenes Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 25, 2010 José Monteiro (DEI / IST) Parallel and Distributed
More informationNetwork Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas
Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/
More informationParallel Program Performance Analysis
Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon
More informationAlgorithm Design CS 515 Fall 2015 Sample Final Exam Solutions
Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Copyright c 2015 Andrew Klapper. All rights reserved. 1. For the functions satisfying the following three recurrences, determine which is the
More informationTrivially parallel computing
Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our
More informationLecture 4: Linear Algebra 1
Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More information1 / 28. Parallel Programming.
1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:
More informationReal Time Operating Systems
Real Time Operating ystems Luca Abeni luca.abeni@unitn.it Interacting Tasks Until now, only independent tasks... A job never blocks or suspends A task only blocks on job termination In real world, jobs
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationParallel LU Decomposition (PSC 2.3) Lecture 2.3 Parallel LU
Parallel LU Decomposition (PSC 2.3) 1 / 20 Designing a parallel algorithm Main question: how to distribute the data? What data? The matrix A and the permutation π. Data distribution + sequential algorithm
More informationLecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel
More informationINF Models of concurrency
INF4140 - Models of concurrency RPC and Rendezvous INF4140 Lecture 15. Nov. 2017 RPC and Rendezvous Outline More on asynchronous message passing interacting processes with different patterns of communication
More informationRevisiting Matrix Product on Master-Worker Platforms
Revisiting Matrix Product on Master-Worker Platforms Jack Dongarra 2, Jean-François Pineau 1, Yves Robert 1,ZhiaoShi 2,andFrédéric Vivien 1 1 LIP, NRS-ENS Lyon-INRIA-UBL 2 Innovative omputing Laboratory
More informationSequential Circuits Sequential circuits combinational circuits state gate delay
Sequential Circuits Sequential circuits are those with memory, also called feedback. In this, they differ from combinational circuits, which have no memory. The stable output of a combinational circuit
More informationDivide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science
Divide & Conquer Jordi Cortadella and Jordi Petit Department of Computer Science Divide-and-conquer algorithms Strategy: Divide the problem into smaller subproblems of the same type of problem Solve the
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationEarth System Modeling Domain decomposition
Earth System Modeling Domain decomposition Graziano Giuliani International Centre for Theorethical Physics Earth System Physics Section Advanced School on Regional Climate Modeling over South America February
More informationAn Indian Journal FULL PAPER. Trade Science Inc. A real-time causal order delivery approach in largescale ABSTRACT KEYWORDS
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 18 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(18), 2014 [10717-10723] A real-time causal order delivery approach in largescale
More informationScheduling divisible loads with return messages on heterogeneous master-worker platforms
Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont, Loris Marchal, Yves Robert Laboratoire de l Informatique du Parallélisme École Normale Supérieure
More informationCS 453 Operating Systems. Lecture 7 : Deadlock
CS 453 Operating Systems Lecture 7 : Deadlock 1 What is Deadlock? Every New Yorker knows what a gridlock alert is - it s one of those days when there is so much traffic that nobody can move. Everything
More informationNon-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions
Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions Mitra Nasri Chair of Real-time Systems, Technische Universität Kaiserslautern, Germany nasri@eit.uni-kl.de
More informationExam Spring Embedded Systems. Prof. L. Thiele
Exam Spring 20 Embedded Systems Prof. L. Thiele NOTE: The given solution is only a proposal. For correctness, completeness, or understandability no responsibility is taken. Sommer 20 Eingebettete Systeme
More informationFigure 10.1 Skew between computer clocks in a distributed system
Figure 10.1 Skew between computer clocks in a distributed system Network Instructor s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 Pearson Education 2001
More informationOn Equilibria of Distributed Message-Passing Games
On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationBig Data Analytics. Lucas Rego Drumond
Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32 Outline 1. Introduction 2. Parallel
More informationDistributed Systems Byzantine Agreement
Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.
More informationLecture 4. Writing parallel programs with MPI Measuring performance
Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths
More informationCOL 730: Parallel Programming
COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some
More information