Program Performance Metrics
|
|
- Shanon Joseph
- 5 years ago
- Views:
Transcription
1 Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the ratio of the time needed to solve the problem on a single processor (seq) to the time required to solve the same problem on parallel system with "p" processors (par) relative - seq is the execution time of parallel algorithm executing on one of the processors of parallel computer real - seq is the execution time for the best-know algorithm using one of the processors of parallel computer absolute - seq is the execution time for the best-know algorithm using the best-know computer 1 1
2 Program Performance Metrics he efficiency (E) of parallel program is defined as a ratio of speedup to the number of processors he cost is usually defined as a product of a parallel run time and the number of processors he scalability of parallel system is a measure of its capacity to increase speedup in proportion to the number of processors
3 Communication costs in static Interconnection networks Principal parameters - startup time (t s ) - per-hop time (t h ) - per-word transfer time (t w ) Routing techniques - store-and-forward routing - cut-through routing
4 Communication costs depends on routing strategy Store and forward routing - the message is sending between different processors and each intermediate processor store it in the local memory until received the whole message ( ) tcomm t s mtw th l Cut-through routing - the message is divided on parts which are sending between processors without waiting for the whole message tcomm ts lth mtw
5 Basic communication operations -Simple message transfer between two processors -One-to-all broadcast -All-to-all broadcast -One-to-all personalized communication -All-to-all personalized communication - Circular shift 5
6 One-to-all broadcast M M M M 0 1 p-1 Single-node accumulation 0 1 p-1 M p-1 M p-1 M p-1 All-to-all broadcast M 1 M 0 M 1 M p-1 M p p-1 Multinode accumulation M 1 M 0 M 1 M 0 M p-1 One-to-all personalized M 1 M 0 M 0 M 1 M p p p-1 Single-node gather M 0,p-1 M 0,1 M 0,0 6 M 1,p-1 M 1,1 M 1,0 M p-1,p-1 M p-1,1 M p-1,0 All-to-all personalized M p-1,0 M 1,0 M 0,0 M p-1,1 M 1,1 M 0,1 M p-1.p p p-1 Multinode gather M 1,p-1 M 0,p-1
7 One-to-all broadcast - SF a) in a ring with even number of procesors b) in a ring with odd number of procesors. one _ to_ all_ b t t m 7 7 s w p
8 8 8 One-to-all broadcast - SF in a mesh with wraparound p m t t w s b all to one _
9 One-to-all broadcast - SF (110) (111) 6 7 (010) (011) 1 5 (100) (101) 0 1 (000) (001) in a hypercube one_ to_ all_ b t t mlog p s w 9 9
10 One-to-all broadcast - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} {1} {1} {1} {15} {16} {17} {18} {19} procedure ONE_O_ALL_BC(d,my_id,X); begin mask:= d -1; for i:=d-1 downto 0 do begin mask:=mask XOR i ; if (my_id AND mask)=0 then if (my_id AND i )=0 then begin msg_destination:=my_id XOR i ; send X to msg_destination; endif else begin msg_source:=my_id XOR i ; receive X from msg_source; endelse; endfor; end ONE_O_ALL_BC A code of one to all broadcast operation in hypercube (processor with label 0 is broadcasting its message)
11 One-to-all broadcast - C in a ring onetoallbc t s log p t w mlog p t h p 1 11
12 One-to-all broadcast - C in a mesh with wraparound onetoallbc t t mlog p t p 1 s w h 1 1
13 One-to-all broadcast - C in a balanced binary tree onetoallbc t t mt log p 1 log p s w h 1 1
14 7. communication step.... communication step 1. communication step All-to-all broadcast - SF 1(7) 1(6) 1(5) () (7) (6) (5) () (0) (1) () () 1() 1(0) 1(1) 1() (6) (5) () () (6,7) (5,6) (,5) (,) (0,7) (0,1) (1,) (,) () (7) (0) (1) 7(1) 7(0) (7) 7(6) (1..7) (0..6) (0..5,7) (0..,6,7) (0,..7) (0,1,..7) (0..,..7) (0..,5..7) 7(5) 7() 7() 7() 1 1
15 All-to-all broadcast - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} {1} {1} procedure ALL_O_ALL_BC_RING(my_id,my_msg,p,result); begin left:=(my_id - 1) mod p; right:=(my_id + 1) mod p; result:=my_msg; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; end ALL_O_ALL_BC_RING; alltoallbc t t m p 1 s w 15 15
16 All-to-all broadcast - SF (1) (1) (1) (15) 11 (8) (9) (10) (11) () (5) (6) (7) (0) (1) () () 7 1 (1..15) 1 1 (1..15) (1..15) 15 (1..15) 8 (8..11) 9 10 (8..11) (8..11) 11 (8..11) (..7) 5 6 (..7) (..7) 7 (..7) 0 1 (0..) (0..) (0..) (0..) 16 16
17 All-to-all broadcast - SF procedure ALL_O_ALL_BC_MESH(my_id,my_msg,p,result); begin left:= {...}; right:=(...}; result:=my_msg; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; left:= {...}; right:=(...}; msg:=result; for i:=1 to p-1 do begin send msg to right; receive msg from left; result:=result msg; endfor; end ALL_O_ALL_BC_MESH; p 1 t mp 1 t alltoallbc s w 17 17
18 All-to-all broadcast - SF () (0) 0 () (6) (7) 6 1 () (1) 7 5 (,) (5) (0,1) 0 (6,7) (6,7) 6 (,5) 1 (,) (0,1) a) Initial distribution of messages b) Distribusion before the second step 7 5 (,5) (..7) (..7) 6 7 (0..7) (0..7) 6 7 (0..) (0..) (0..7) (0..7) (0..) 0 (..7) 1 (0..) 5 (..7) (0..7) 0 (0..7) 1 (0..7) 5 (0..7) c) Distribusion before the third step d) Final distribusion of messages 18 alltoallbc t s log p t m p 1 w
19 All-to-all broadcast with reduction - SF { 1} { } { } { } { 5} { 6} { 7} { 8} { 9} {10} {11} procedure ALL_O_ALL_BC_HCUBE(my_id,my_msg,d,result); begin result:=my_msg; for i:=1 to d-1 do begin partner:=my_id XOR i ; send result to partner; receive msg from partner; result:=result msg; endfor; end ALL_O_ALL_BC_HCUBE; alltoallbc t t mlog p s w 19 19
20 7. communication step.... communication step 1. communication step onetoall pers One-to-all personalized - SF t t m p 1 s w (7) (6) (7) 7(7) (6) 7(5) 0 1 7() 7(1) 7() 7() 0 0
21 One-to-all personalized - SF (1..15) (8..11) (..7) (0..) (1) (1) (1) (15) 11 (8) (9) (10) (11) () (5) (6) (7) (0) (1) () () p 1 t mp 1 t onetoall pers s w 1 1
22 All-to-all personalized - SF (5,0) (5,1) (5,) (5,) (5,) (,0) (,1) (,) (,) (,5) (0,1) (0,) (0,) (0,) (0,5) (,0) (,1) (,) (,) (,5) (1,0) (1,) (1,) (1,) (1,5) (,0) (,1) (,) (,) (,5) 1. communication step (,0) (,1) (,) (,) (,0) (,1) (,) 5 (,5) (,5) 0 1. communication step (5,1) (0,) (5,) (0,) (5,) (0,) (5,) (0,5) (,0) (,1) (,) (1,0) (1,) (1,) (1,5)
23 All-to-all personalized - SF (,0) (,1) (,) (,0) (,1) (1,0) (1,) 5 (,5) (1,5) 0 1 (0,) (0,) (0,5). communication step (,1) (,) (,) (5,) (5,) (5,) (,0) (,1) (1,0) (1,5) (0,) (0,5) (,1) (,) (,) (,) (5,) (5,). communication step (0,5) 5 (1,0) 5. (,) communication 0 1 step (,1) (,) (5,)
24 All-to-all personalized - SF 6 (6,0),(6,),(6,6) (6,1),(6,),(6,7) (6,),(6,5),(6,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) 7 (7,0),(7,),(7,6) (7,1),(7,),(7,7) (7,),(7,5),(7,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) 8 5 (8,0),(8,),(8,6) (8,1),(8,),(8,7) (8,),(8,5),(8,8) (5,0),(5,),(5,6) (5,1),(5,),(5,7) (5,),(5,5),(5,8) 0 (0,0),(0,),(0,6) (0,1),(0,),(0,7) (0,),(0,5),(0,8) 1 (1,0),(1,),(1,6) (1,1),(1,),(1,7) (1,),(1,5),(1,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8)
25 All-to-all personalized - SF 6 (6,0),(6,),(6,6) 7 (6,1),(6,),(6,7) 8 (6,),(6,5),(6,8) (7,0),(7,),(7,6) (7,1),(7,),(7,7) (7,),(7,5),(7,8) (8,0),(8,),(8,6) (8,1),(8,),(8,7) (8,),(8,5),(8,8) (,0),(,),(,6) (,1),(,),(,7) 5 (,),(,5),(,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) (5,0),(5,),(5,6) (5,1),(5,),(5,7) (5,),(5,5),(5,8) 0 1 (0,0),(0,),(0,6) (0,1),(0,),(0,7) (0,),(0,5),(0,8) (1,0),(1,),(1,6) (1,1),(1,),(1,7) (1,),(1,5),(1,8) (,0),(,),(,6) (,1),(,),(,7) (,),(,5),(,8) t t mp p 1 alltoall pers s w 5 5
Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms
Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationFORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D Jülich, Tel. (02461)
FORSCHUNGSZENTRUM JÜLICH GmbH Zentralinstitut für Angewandte Mathematik D-52425 Jülich, Tel. (02461) 61-6402 Interner Bericht Isoefficiency Analysis of Parallel QMR-Like Iterative Methods and its Implications
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationA Nested Dissection Parallel Direct Solver. for Simulations of 3D DC/AC Resistivity. Measurements. Maciej Paszyński (1,2)
A Nested Dissection Parallel Direct Solver for Simulations of 3D DC/AC Resistivity Measurements Maciej Paszyński (1,2) David Pardo (2), Carlos Torres-Verdín (2) (1) Department of Computer Science, AGH
More informationEECS 358 Introduction to Parallel Computing Final Assignment
EECS 358 Introduction to Parallel Computing Final Assignment Jiangtao Gou Zhenyu Zhao March 19, 2013 1 Problem 1 1.1 Matrix-vector Multiplication on Hypercube and Torus As shown in slide 15.11, we assumed
More information1 / 28. Parallel Programming.
1 / 28 Parallel Programming pauldj@aices.rwth-aachen.de Collective Communication 2 / 28 Barrier Broadcast Reduce Scatter Gather Allgather Reduce-scatter Allreduce Alltoall. References Collective Communication:
More informationOn Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes
American Journal of Applied Sciences 5(11): 1605-1610, 2008 ISSN 1546-9239 2008 Science Publications On Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes 1 Ioana Zelina, 2 Grigor Moldovan
More informationCSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms
Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures
More informationCSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )
CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance
More informationDistributed Systems Byzantine Agreement
Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers
More informationTight Bounds on the Ratio of Network Diameter to Average Internode Distance. Behrooz Parhami University of California, Santa Barbara
Tight Bounds on the Ratio of Network Diameter to Average Internode Distance Behrooz Parhami University of California, Santa Barbara About This Presentation This slide show was first developed in fall of
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationLecture 4. Writing parallel programs with MPI Measuring performance
Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths
More informationTiming Results of a Parallel FFTsynth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu
More informationCEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication
1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationAll of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and
Efficient Parallel Algorithms for Template Matching Sanguthevar Rajasekaran Department of CISE, University of Florida Abstract. The parallel complexity of template matching has been well studied. In this
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationOverview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationODD EVEN SHIFTS IN SIMD HYPERCUBES 1
ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 Sanjay Ranka 2 and Sartaj Sahni 3 Abstract We develop a linear time algorithm to perform all odd (even) length circular shifts of data in an SIMD hypercube. As an application,
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture Interconnection Network Performance Performance Analysis of Interconnection Networks Bandwidth Latency Proportional to diameter Latency with contention Processor
More informationAgreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More informationConquering Edge Faults in a Butterfly with Automorphisms
International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-0) Conquering Edge Faults in a Butterfly with Automorphisms Meghanad D. Wagh and Khadidja Bendjilali Department
More informationOnline Packet Routing on Linear Arrays and Rings
Proc. 28th ICALP, LNCS 2076, pp. 773-784, 2001 Online Packet Routing on Linear Arrays and Rings Jessen T. Havill Department of Mathematics and Computer Science Denison University Granville, OH 43023 USA
More informationEmbeddings, Fault Tolerance and Communication Strategies in k-ary n-cube Interconnection Networks
Embeddings, Fault Tolerance and Communication Strategies in k-ary n-cube Interconnection Networks Yaagoub A. Ashir A thesis submitted for the degree of Doctor of Philosophy Department of Mathematics &
More informationData Gathering and Personalized Broadcasting in Radio Grids with Interferences
Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Jean-Claude Bermond a,, Bi Li a,b, Nicolas Nisse a, Hervé Rivano c, Min-Li Yu d a Coati Project, INRIA I3S(CNRS/UNSA), Sophia
More informationBig Data Analytics. Lucas Rego Drumond
Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32 Outline 1. Introduction 2. Parallel
More informationClass Field Theory. Steven Charlton. 29th February 2012
Class Theory 29th February 2012 Introduction Motivating examples Definition of a binary quadratic form Fermat and the sum of two squares The Hilbert class field form x 2 + 23y 2 Motivating Examples p =
More informationEarly stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination
TRB for benign failures Early stopping: the idea Sender in round : :! send m to all Process p in round! k, # k # f+!! :! if delivered m in round k- and p " sender then 2:!! send m to all 3:!! halt 4:!
More informationData Gathering and Personalized Broadcasting in Radio Grids with Interferences
Data Gathering and Personalized Broadcasting in Radio Grids with Interferences Jean-Claude Bermond a,b,, Bi Li b,a,c, Nicolas Nisse b,a, Hervé Rivano d, Min-Li Yu e a Univ. Nice Sophia Antipolis, CNRS,
More informationParallel Performance Theory - 1
Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationParallel Scientific Computing
IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.
More informationDivisible Job Scheduling in Systems with Limited Memory. Paweł Wolniewicz
Divisible Job Scheduling in Systems with Limited Memory Paweł Wolniewicz 2003 Table of Contents Table of Contents 1 1 Introduction 3 2 Divisible Job Model Fundamentals 6 2.1 Formulation of the problem.......................
More informationParallel Genetic Algorithms
Parallel Genetic Algorithms Timothy H. Kaiser, Ph.D. Introduction Project Purpose What is a Genetic Algorithm? Serial Algorithm Modes of Parallelization My All to All Example Problems Results Future Direction
More informationA Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks
A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,
More informationConcurrent Counting is harder than Queuing
Concurrent Counting is harder than Queuing Costas Busch Rensselaer Polytechnic Intitute Srikanta Tirthapura Iowa State University 1 rbitrary graph 2 Distributed Counting count count count count Some processors
More informationContinuing discussion of CRC s, especially looking at two-bit errors
Continuing discussion of CRC s, especially looking at two-bit errors The definition of primitive binary polynomials Brute force checking for primitivity A theorem giving a better test for primitivity Fast
More informationFault-Tolerant Consensus
Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process
More information0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA
0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single
More informationSection 6 Fault-Tolerant Consensus
Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.
More informationSDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version
SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to
More informationTight Bounds on the Diameter of Gaussian Cubes
Tight Bounds on the Diameter of Gaussian Cubes DING-MING KWAI AND BEHROOZ PARHAMI Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106 9560, USA Email: parhami@ece.ucsb.edu
More informationGray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ
Gray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ Myung M. Bae Scalable POWERparallel Systems, MS/P963, IBM Corp. Poughkeepsie, NY 6 myungbae@us.ibm.com Bella Bose Dept. of Computer Science Oregon
More informationCSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego
CSE 4 Lecture Standard Combinational Modules CK Cheng and Diba Mirza CSE Dept. UC San Diego Part III - Standard Combinational Modules (Harris: 2.8, 5) Signal Transport Decoder: Decode address Encoder:
More informationParallel Numerics. Scope: Revise standard numerical methods considering parallel computations!
Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:
More informationEfficient Notification Ordering for Geo-Distributed Pub/Sub Systems
R. BALDONI ET AL. 1 Efficient Notification Ordering for Geo-Distributed Pub/Sub Systems Supplemental material Roberto Baldoni, Silvia Bonomi, Marco Platania, and Leonardo Querzoni 1 ALGORITHM PSEUDO-CODE
More information3/11/18. Final Code Generation and Code Optimization
Final Code Generation and Code Optimization 1 2 3 for ( i=0; i < N; i++) { base = &a[0]; crt = *(base + i); } original code base = &a[0]; for ( i=0; i < N; i++) { crt = *(base + i); } optimized code e1
More informationSolving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *
Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.
More informationConsensus when failstop doesn't hold
Consensus when failstop doesn't hold FLP shows that can't solve consensus in an asynchronous system with no other facility. It can be solved with a perfect failure detector. If p suspects q then q has
More information3D Parallel Elastodynamic Modeling of Large Subduction Earthquakes
D Parallel Elastodynamic Modeling of Large Subduction Earthquakes Eduardo Cabrera 1 Mario Chavez 2 Raúl Madariaga Narciso Perea 2 and Marco Frisenda 1 Supercomputing Dept. DGSCA UNAM C.U. 04510 Mexico
More informationFormal Verification of Mobile Network Protocols
Dipartimento di Informatica, Università di Pisa, Italy milazzo@di.unipi.it Pisa April 26, 2005 Introduction Modelling Systems Specifications Examples Algorithms Introduction Design validation ensuring
More informationAntonio Falabella. 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, September 2015, Hamburg
INFN - CNAF (Bologna) 3 rd nternational Summer School on INtelligent Signal Processing for FrontIEr Research and Industry, 14-25 September 2015, Hamburg 1 / 44 Overview 1 2 3 4 5 2 / 44 to Computing The
More informationNotation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing
Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE
More informationParallel Program Performance Analysis
Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon
More informationClosed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions
Closed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions Saâd Biaz Λ Jennifer L. Welch y Key words: Distributed Computing Clock Synchronization Optimal Precision Generalized Hypercube
More informationAndrew Morton University of Waterloo Canada
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationCS 170 Algorithms Fall 2014 David Wagner MT2
CS 170 Algorithms Fall 2014 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) Your Student ID number: Your Unix account login: cs170- The room you are sitting in right now: Name of the
More informationCounting in Practical Anonymous Dynamic Networks is Polynomial
Counting in Practical Anonymous Dynamic Networks is Polynomial Maitri Chakraborty, Alessia Milani, and Miguel A. Mosteiro NETyS 2016 The Internet of Things The Counting Problem How do you count the size
More informationMultipole-Based Preconditioners for Sparse Linear Systems.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal
More informationHw 6 due Thursday, Nov 3, 5pm No lab this week
EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM
More informationChapter 7. Sequential Circuits Registers, Counters, RAM
Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage
More informationParallel Performance Theory
AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline
More informationTIME DEPENDENCE OF SHELL MODEL CALCULATIONS 1. INTRODUCTION
Mathematical and Computational Applications, Vol. 11, No. 1, pp. 41-49, 2006. Association for Scientific Research TIME DEPENDENCE OF SHELL MODEL CALCULATIONS Süleyman Demirel University, Isparta, Turkey,
More informationSearching for Black Holes in Subways
Searching for Black Holes in Subways Paola Flocchini Matthew Kellett Peter C. Mason Nicola Santoro Abstract Current mobile agent algorithms for mapping faults in computer networks assume that the network
More informationAgreement. Today. l Coordination and agreement in group communication. l Consensus
Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process
More informationImage Reconstruction And Poisson s equation
Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have
More informationUnreliable Failure Detectors for Reliable Distributed Systems
Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms
More information2. One-To-All Broadcast and All-To-One Reduction. 1. Chapter 4 : Efficient Collective Communication
1. Chater : Efficient Collective Communication Collective communication: comm amongst collection of nodes (not just sender & recver. One-to-all (bcast, all-to-one (reduc, all-to-all, scatter/gather, etc.
More informationNetwork Congestion Measurement and Control
Network Congestion Measurement and Control Wasim El-Hajj Dionysios Kountanis Anupama Raju email: {welhajj, kountan, araju}@cs.wmich.edu Department of Computer Science Western Michigan University Kalamazoo,
More informationOn The Energy Complexity of Parallel Algorithms
On The Energy Complexity of Parallel Algorithms Vijay Anand Korthikanti Department of Computer Science University of Illinois at Urbana Champaign vkortho2@illinois.edu Gul Agha Department of Computer Science
More informationCombinatorial algorithms
Combinatorial algorithms computing subset rank and unrank, Gray codes, k-element subset rank and unrank, computing permutation rank and unrank Jiří Vyskočil, Radek Mařík 2012 Combinatorial Generation definition:
More informationNetwork Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas
Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,
More informationab initio Electronic Structure Calculations
ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab
More informationGossip Latin Square and The Meet-All Gossipers Problem
Gossip Latin Square and The Meet-All Gossipers Problem Nethanel Gelernter and Amir Herzberg Department of Computer Science Bar Ilan University Ramat Gan, Israel 52900 Email: firstname.lastname@gmail.com
More informationDynamic Programming. Data Structures and Algorithms Andrei Bulatov
Dynamic Programming Data Structures and Algorithms Andrei Bulatov Algorithms Dynamic Programming 18-2 Weighted Interval Scheduling Weighted interval scheduling problem. Instance A set of n jobs. Job j
More informationVNS for the TSP and its variants
VNS for the TSP and its variants Nenad Mladenović, Dragan Urošević BALCOR 2011, Thessaloniki, Greece September 23, 2011 Mladenović N 1/37 Variable neighborhood search for the TSP and its variants Problem
More informationSome mathematical properties of Cayley digraphs with applications to interconnection network design
International Journal of Computer Mathematics Vol. 82, No. 5, May 2005, 521 528 Some mathematical properties of Cayley digraphs with applications to interconnection network design WENJUN XIAO and BEHROOZ
More informationMulti-join Query Evaluation on Big Data Lecture 2
Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday
More informationMulticore Semantics and Programming
Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and
More informationLecture 4: Divide and Conquer: van Emde Boas Trees
Lecture 4: Divide and Conquer: van Emde Boas Trees Series of Improved Data Structures Insert, Successor Delete Space This lecture is based on personal communication with Michael Bender, 001. Goal We want
More informationConstruction of Vertex-Disjoint Paths in Alternating Group Networks
Construction of Vertex-Disjoint Paths in Alternating Group Networks (April 22, 2009) Shuming Zhou Key Laboratory of Network Security and Cryptology, Fujian Normal University, Fuzhou, Fujian 350108, P.R.
More informationIn Some Curved Spaces, We Can Solve NP-Hard Problems in Polynomial Time: Towards Matiyasevich s Dream
In Some Curved Spaces, We Can Solve NP-Hard Problems in Polynomial Time: Towards Matiyasevich s Dream Vladik Kreinovich 1 and Maurice Margenstern 2 1 Department of Computer Science University of Texas
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationMPI parallel implementation of CBF preconditioning for 3D elasticity problems 1
Mathematics and Computers in Simulation 50 (1999) 247±254 MPI parallel implementation of CBF preconditioning for 3D elasticity problems 1 Ivan Lirkov *, Svetozar Margenov Central Laboratory for Parallel
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/
More informationImpression Store: Compressive Sensing-based Storage for. Big Data Analytics
Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in
More informationParallel Algorithms. A. Legrand. Arnaud Legrand, CNRS, University of Grenoble. LIG laboratory, October 2, / 235
Parallel Arnaud Legrand, CNRS, University of Grenoble LIG laboratory, arnaud.legrand@imag.fr October 2, 2011 1 / 235 Outline Parallel Part I Models Part II Communications on a Ring Part III Speedup and
More informationOn Detecting Multiple Faults in Baseline Interconnection Networks
On Detecting Multiple Faults in Baseline Interconnection Networks SHUN-SHII LIN 1 AND SHAN-TAI CHEN 2 1 National Taiwan Normal University, Taipei, Taiwan, ROC 2 Chung Cheng Institute of Technology, Tao-Yuan,
More informationINTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University. Data Set - SSN's from UTSA Class
Dr. Thomas E. Hicks Data Abstractions Homework - Hashing -1 - INTRODUCTION TO HASHING Dr. Thomas Hicks Trinity University Data Set - SSN's from UTSA Class 467 13 3881 498 66 2055 450 27 3804 456 49 5261
More informationCoding for loss tolerant systems
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon
More informationFast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System
IEEE TRASACTIOS O PARALLEL AD DISTRIBUTED SYSTEMS, VOL. 9, O. 8, AUGUST 1998 705 Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined
More informationNAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00
NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR Sp ' 00 May 3 OPEN BOOK exam (students are permitted to bring in textbooks, handwritten notes, lecture notes
More informationCS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University
CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution
More information