Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun
|
|
- Homer Hill
- 5 years ago
- Views:
Transcription
1 Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017
2 DPHPC Overview cache coherency memory models 2
3 Speedup An application runs in time T 1 on one processor, T p for p processors. Measured speedup on p processors: T 1 T p Can we bound/estimate speedup without running the application? 3
4 Scaling Models Intuition Consider the following application: auto A = InitializeArray(random_seed, important_info); for (int i = 0; i < 10000; ++i) for (int j = 0; j < 10000; ++j) B[i][j] = Compute(A[i][j]); SequentialDecisionProcess(B); for (int i = 0; i < 10000; ++i) for (int j = 0; j < 10000; ++j) C[i][j] = Compute(B[i][j]); WriteResults(C); Sequential Parallel Sequential Parallel Sequential 4
5 Amdahl s Law Given f 0, 1 the sequential fraction of an application: T p 1 f T 1 p + f T 1 Gene Amdahl
6 Scaling Models Intuition (revisited) Consider the following application: auto A = InitializeArray(random_seed, important_info); for (int i = 0; i < 10000; ++i) for (int j = 0; j < 10000; ++j) B[i][j] = Compute(A[i][j]); Sequential Parallel f T 1 1 f SequentialDecisionProcess(B); Sequential for (int i = 0; i < 10000; ++i) for (int j = 0; j < 10000; ++j) C[i][j] = Compute(B[i][j]); Parallel T 2 WriteResults(C); Sequential f 1 f 2 1 f 2 6
7 Amdahl s Law Given f 0, 1 the sequential fraction of an application: T p 1 f T 1 p Speedup can be modeled as: And efficiency: S p T 1 T p 1 1 f p + f T 1 + f E p = S p p 1 1 f + fp Gene Amdahl
8 (1-f)/2 T 1 (1-f)/3 T 1 (1-f)/4 T 1 (1-f)/5 T 1 Amdahl s Law: Runtime Runtime ft 1 T 1 ft 1 ft 1 ft1 ft1 (1-f)T 1 #Processors/Cores
9 Amdahl s Law: Speedup Speedup linear speedup f = 1/15 f = 1/ Number of Cores
10 Amdahl s Law As p : T f T 1 S 1 f E = 0 (if f 0) 10
11 Question Is Amdahl s law pessimistic or optimistic? 11
12 Amdahl s Law is Pessimistic Fixes the problem size More processors Larger problems f is not a constant fraction, but f(n) The same applies for T 1 and T 1 (n) 12
13 Case Study: Weather Forecasting Grid point climate/atmospheric models GFS GME COSMO The denser the grid, the more accurate prediction becomes Important for anticipating natural disasters! 13
14 Gustafson s Law States: S 1 f n n (if f n 0) John Gustafson Larger problems can be solved (with more resources) at the same time. 14
15 Strong and Weak Scaling Strong scaling: Behavior of S p (n) for a fixed n, and p Fixed problem size as p increases Weak scaling: Behavior of S p (n) for n, p Increasing problem size as p increases 15
16 Pessimism/Optimism Revisited Could both models be optimistic? 16
17 Runtime With Parallel Overhead Runtime Parallel overhead increases with # of processors #Processors/Cores
18 Other Issue: Load Balancing Amdahl s Law assumes that computation divides evenly Processor waiting time is another form of overhead Task started Working time Task completed Waiting time
19 Runtime with Parallel Overhead and Work Imbalance Runtime Time lost due to work imbalance #Processors/Cores
20 Speedup Real Speedup Graphs often Look like This: linear speedup Number of Processors
21 Additional Resources Assumption in both laws: increasing p does not change other resources (e.g. cache) It is thus possible to achieve super-linear speedup: Example 1: Cache size increase leads to larger working set Example 2: Probabilistic convergent algorithms 21
22 Amdahl s Law: Summary Strong, simple models for scaling Ignore overhead incurred by parallelization and load imbalance In reality: S p n = T 1 (n) T p n +A p (n) Programs are not always that simple f does not always exist 22
23 PRAM Model Parallel Random Access Machine Abstract model for shared memory parallelism Shared Memory Any processor can access the memory at unit time What about access conflicts? Exclusive read exclusive write (EREW) Concurrent read exclusive write (CREW) Concurrent read concurrent write (CRCW) Proc. 1 Proc. 2 Proc. p 23
24 PRAM Model: Program Representation Programs represented as DAGs (Directed Acyclic Graphs) Nodes: Unit-time operations Edges: Dependencies Inputs We can now define: W n = #nodes D n =longest path from input to output Average parallelism: W(n)/D(n) 7 6 Outputs 8 24
25 PRAM Model: Advantages Powerful tool for analyzing parallel algorithms No need to manage communication and synchronization Focus on algorithmic aspects Time-unit and processor abstraction is robust w.r.t. architectures 25
26 PRAM Model: Examples Consider reduction: (x 1 + x x n ) Sequential: n W n = Θ(n) D n = Θ(n) Parallelism = O(1) W n = Θ(n) Binary tree: n n-1 n-1 + n D n = Θ(log n) Parallelism = O(n/log n) 26
27 PRAM Model: Examples Merge-sort: List MergeSort(List L) { if (L.length() == 1) return L; auto L1 = MergeSort(L.sublist(0, n / 2)); auto L2 = MergeSort(L.sublist(n / 2 + 1, n)); return merge(l1, L2); } W n = Θ(n log n) D n = D n 2 + Θ n = Θ(n) Average parallelism: Θ(log n) Animation: u/morolin 27
28 PRAM Model: Examples Images: The PRAM Model and Algorithms Prof. Robert van Engelen Scan (prefix sum): Input: L = (x 1,, x n ) Output: 0, x 1, x 1 + x 2,, x i List Scan(List L) { if (L.length() == 1) return List { 0 }; List sums (L.length()/2); for (int i = 0; i < L.length()/2; ++i) sums[i] = L[2 * i - 1] + L[2 * i]; List evens = Scan(sums); List odds(ceil(l.length() / 2.0)); for (int i = 0; i < odds.length(); ++i) odds[i] = evens[i] + L[2 * i]; } return Interleave(evens, odds); 28
29 PRAM Model: Examples Scan (prefix sum): Input: L = (x 1,, x n ) Output: 0, x 1, x 1 + x 2,, x i List Scan(List L) { if (L.length() == 1) return List { 0 }; List sums (L.length()/2); for (int i = 0; i < L.length()/2; ++i) sums[i] = L[2 * i - 1] + L[2 * i]; List evens = Scan(sums); List odds(ceil(l.length() / 2.0)); for (int i = 0; i < odds.length(); ++i) odds[i] = evens[i] + L[2 * i]; W n = W n + Θ n = Θ(n) 2 D n = D n + Θ 1 = Θ(log n) 2 Parallelism: O(n/log n) } return Interleave(evens, odds); 29
30 Reasoning in PRAM Given a DAG with W(n) nodes and D(n) depth: Sequential runtime: T 1 n = W n Time on processors: T n = D(n) Time on p processors: T p n =? T p n D(n), T p n W(n)/p Can we bound T p from above? 30
31 Brent s Theorem T p n D n + W n D n p Inputs Intuition: Bound parallelism at each level of the DAG Outputs 31
32 Brent s Theorem T p breakdown s 1 operations s 1 p 4 5 s 2 operations s 2 p s D operations s D p D s i = W i=1 32
33 Brent s Theorem: Proof Observe: a DAG of depth D can be partitioned to D levels, s.t. there are no interdependencies between nodes at the same level i At each DAG level i, we define T p as the time for layer i and p processors, s i (= T i 1 ) denotes the number of nodes in level i, thus: T i p = s i p s i + p 1 p It follows that: D T p n i=1 D si T i + p 1 p p i=1 33
34 Brent s Theorem: Proof D si + p 1 T p n p i=1 = = 1 p W n + p 1 p D n = = D n + W n D n p D n + W n p To conclude: T 1 p = W n p T p n W n p + D(n) = T 1 p + T 34
35 Speedups S p n = T 1 n T p n S p n p S p n W n D n p D n W n W n p+1 p D n S n = W n D n For fixed n, speedup is limited. For n, speedup can be unbounded Example: Tree reduction 35
36 Concurrency Communication Models: α-β Model Gist: α = Latency (cycles), β = Bandwidth (units/cycle) A Latency α Bandwidth β B How long does it take to send a message of size n? α 1 β n Time 36
37 Concurrency α-β Model α 1 β n Time T n = α + n 1 β 37
38 Little s Law: Primer Problem 1: In Tannenbar, every minute on average 2 students come and leave Each student spends 8 minutes inside Q: How many students are in Tannenbar at a given time? A: 16 Problem 2: In your fridge, you have 150 bottles of Vivi Kola (on average) You drink and buy on average 25 bottles per week Q: How many weeks does every bottle last? A: 6 38
39 Little s Law Queueing theory In a stationary system (input rate = output rate), the number of customers N in a queue is given by: N = λw Where λ = arrival rate and W = average time spent in the system John Little Simple rule, yet powerful: N is independent of I/O distribution Connection: in the α-β model, arrival rate is β and time spent in system is α cycles 39
40 Concurrency Little s Law and the α-β Model β Units arrive per cycle Communication Medium β Units leave per cycle α Cycles spent in system 1 β N Concurrent units in transit N = α β Time 40
41 Example: Memory Systems Latency Throughput = Concurrency Latency is reduced by half every 9 years Throughput doubles every 3 years Parallel processing beneficial! Memory Cache Core 1 Core 2 Real-world statistics: Intel Core 2 (2006): α = 100 cycles, β = 2 bytes/cycle N = 200 Intel Haswell (2014): α = 63 cycles, β = 23 bytes/cycle N =
Parallel Performance Theory - 1
Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationParallel Performance Theory
AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline
More informationNotation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing
Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE
More informationAnalytical Modeling of Parallel Programs (Chapter 5) Alexandre David
Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on
More informationLecture 6 September 21, 2016
ICS 643: Advanced Parallel Algorithms Fall 2016 Lecture 6 September 21, 2016 Prof. Nodari Sitchinava Scribe: Tiffany Eulalio 1 Overview In the last lecture, we wrote a non-recursive summation program and
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationMICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE
MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By
More informationPerformance Evaluation of Codes. Performance Metrics
CS6230 Performance Evaluation of Codes Performance Metrics Aim to understanding the algorithmic issues in obtaining high performance from large scale parallel computers Topics for Conisderation General
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationLecture 23: Illusiveness of Parallel Performance. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 23: Illusiveness of Parallel Performance James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L23 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping peel
More informationIS 709/809: Computational Methods in IS Research Fall Exam Review
IS 709/809: Computational Methods in IS Research Fall 2017 Exam Review Nirmalya Roy Department of Information Systems University of Maryland Baltimore County www.umbc.edu Exam When: Tuesday (11/28) 7:10pm
More informationR ij = 2. Using all of these facts together, you can solve problem number 9.
Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationOptimization Techniques for Parallel Code 1. Parallel programming models
Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of
More informationCOMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions
COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationAll of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and
Efficient Parallel Algorithms for Template Matching Sanguthevar Rajasekaran Department of CISE, University of Florida Abstract. The parallel complexity of template matching has been well studied. In this
More informationCSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions
CSE 502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions 1. Consider the following algorithm: for i := 1 to α n log e n do Pick a random j [1, n]; If a[j] = a[j + 1] or a[j] = a[j 1] then output:
More informationParallel Numerics. Scope: Revise standard numerical methods considering parallel computations!
Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:
More informationExploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units
Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern
More informationLecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing
Lecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan yanyh@cse.sc.edu http://cse.sc.edu/~yanyh 1 Topic
More informationComplexity: Some examples
Algorithms and Architectures III: Distributed Systems H-P Schwefel, Jens M. Pedersen Mm6 Distributed storage and access (jmp) Mm7 Introduction to security aspects (hps) Mm8 Parallel complexity (hps) Mm9
More informationAnalytical Modeling of Parallel Programs. S. Oliveira
Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationA Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads
A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads Sardar Anisul HAQUE a, Marc MORENO MAZA a,b and Ning XIE a a Department of Computer Science, University of Western
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationScalability is Quantifiable
Scalability is Quantifiable Universal Scalability Law Baron Schwartz - November 2017 Logistics & Stuff Slides will be posted :) Ask questions anytime! Founder of VividCortex Wrote High Performance MySQL
More informationDivide-and-Conquer Algorithms Part Two
Divide-and-Conquer Algorithms Part Two Recap from Last Time Divide-and-Conquer Algorithms A divide-and-conquer algorithm is one that works as follows: (Divide) Split the input apart into multiple smaller
More informationPRAMs. M 1 M 2 M p. globaler Speicher
PRAMs A PRAM (parallel random access machine) consists of p many identical processors M,..., M p (RAMs). Processors can read from/write to a shared (global) memory. Processors work synchronously. M M 2
More informationCompetitive Concurrent Distributed Queuing
Competitive Concurrent Distributed Queuing Maurice Herlihy Srikanta Tirthapura Roger Wattenhofer Brown University Brown University Microsoft Research Providence RI 02912 Providence RI 02912 Redmond WA
More informationGoals for Performance Lecture
Goals for Performance Lecture Understand performance, speedup, throughput, latency Relationship between cycle time, cycles/instruction (CPI), number of instructions (the performance equation) Amdahl s
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationGraph-theoretic Problems
Graph-theoretic Problems Parallel algorithms for fundamental graph-theoretic problems: We already used a parallelization of dynamic programming to solve the all-pairs-shortest-path problem. Here we are
More information= ( 1 P + S V P) 1. Speedup T S /T V p
Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation
More informationRetiming. delay elements in a circuit without affecting the input/output characteristics of the circuit.
Chapter Retiming NCU EE -- SP VLSI esign. Chap. Tsung-Han Tsai 1 Retiming & A transformation techniques used to change the locations of delay elements in a circuit without affecting the input/output characteristics
More informationCommunication-avoiding LU and QR factorizations for multicore architectures
Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding
More informationMicro-architecture Pipelining Optimization with Throughput- Aware Floorplanning
Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &
More information1 Substitution method
Recurrence Relations we have discussed asymptotic analysis of algorithms and various properties associated with asymptotic notation. As many algorithms are recursive in nature, it is natural to analyze
More informationCOL 730: Parallel Programming
COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some
More informationMarwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation
and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions
More informationLecture 22: Multithreaded Algorithms CSCI Algorithms I. Andrew Rosenberg
Lecture 22: Multithreaded Algorithms CSCI 700 - Algorithms I Andrew Rosenberg Last Time Open Addressing Hashing Today Multithreading Two Styles of Threading Shared Memory Every thread can access the same
More informationBarrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers
Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationArnaud Legrand, CNRS, University of Grenoble. November 1, LIG laboratory, Theoretical Parallel Computing. A.
Theoretical Arnaud Legrand, CNRS, University of Grenoble LIG laboratory, arnaudlegrand@imagfr November 1, 2009 1 / 63 Outline Theoretical 1 2 3 2 / 63 Outline Theoretical 1 2 3 3 / 63 Models for Computation
More informationCPU SCHEDULING RONG ZHENG
CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM
More informationOverview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationTheoretical aspects of ERa, the fastest practical suffix tree construction algorithm
Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Matevž Jekovec University of Ljubljana Faculty of Computer and Information Science Oct 10, 2013 Text indexing problem
More informationDefinition: Alternating time and space Game Semantics: State of machine determines who
CMPSCI 601: Recall From Last Time Lecture Definition: Alternating time and space Game Semantics: State of machine determines who controls, White wants it to accept, Black wants it to reject. White wins
More informationLecture 7: Dynamic Programming I: Optimal BSTs
5-750: Graduate Algorithms February, 06 Lecture 7: Dynamic Programming I: Optimal BSTs Lecturer: David Witmer Scribes: Ellango Jothimurugesan, Ziqiang Feng Overview The basic idea of dynamic programming
More informationMulti-threading model
Multi-threading model High level model of thread processes using spawn and sync. Does not consider the underlying hardware. Algorithm Algorithm-A begin { } spawn Algorithm-B do Algorithm-B in parallel
More informationParallel Transposition of Sparse Data Structures
Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationCSM: Operational Analysis
CSM: Operational Analysis 2016-17 Computer Science Tripos Part II Computer Systems Modelling: Operational Analysis by Ian Leslie Richard Gibbens, Ian Leslie Operational Analysis Based on the idea of observation
More informationRandomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th
CSE 3500 Algorithms and Complexity Fall 2016 Lecture 10: September 29, 2016 Quick sort: Average Run Time In the last lecture we started analyzing the expected run time of quick sort. Let X = k 1, k 2,...,
More informationParallelism in Structured Newton Computations
Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca
More informationICS 252 Introduction to Computer Design
ICS 252 fall 2006 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred [Mic94] G. De Micheli Synthesis and Optimization of Digital Circuits McGraw-Hill, 1994. [CLR90]
More informationHardware Acceleration Technologies in Computer Algebra: Challenges and Impact
Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact Ph.D Thesis Presentation of Sardar Anisul Haque Supervisor: Dr. Marc Moreno Maza Ontario Research Centre for Computer Algebra
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationCSE 548: Analysis of Algorithms. Lecture 12 ( Analyzing Parallel Algorithms )
CSE 548: Analysis of Algorithms Lecture 12 ( Analyzing Parallel Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Fall 2017 Why Parallelism? Moore s Law Source: Wikipedia
More informationRevisiting the Cache Miss Analysis of Multithreaded Algorithms
Revisiting the Cache Miss Analysis of Multithreaded Algorithms Richard Cole Vijaya Ramachandran September 19, 2012 Abstract This paper concerns the cache miss analysis of algorithms when scheduled in work-stealing
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationPartha Sarathi Mandal
MA 252: Data Structures and Algorithms Lecture 32 http://www.iitg.ernet.in/psm/indexing_ma252/y12/index.html Partha Sarathi Mandal Dept. of Mathematics, IIT Guwahati The All-Pairs Shortest Paths Problem
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization
More informationTiming Results of a Parallel FFTsynth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationA Note on Parallel Algorithmic Speedup Bounds
arxiv:1104.4078v1 [cs.dc] 20 Apr 2011 A Note on Parallel Algorithmic Speedup Bounds Neil J. Gunther February 8, 2011 Abstract A parallel program can be represented as a directed acyclic graph. An important
More informationCIS 4930/6930: Principles of Cyber-Physical Systems
CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles
More informationTDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.
TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor
More informationCS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/
CS-206 Concurrency Lecture 13 Wrap Up Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/ Created by Nooshin Mirzadeh, Georgios Psaropoulos and Babak Falsafi EPFL Copyright 2015 EPFL CS-206 Spring
More informationn-level Graph Partitioning
Vitaly Osipov, Peter Sanders - Algorithmik II 1 Vitaly Osipov: KIT Universität des Landes Baden-Württemberg und nationales Grossforschungszentrum in der Helmholtz-Gemeinschaft Institut für Theoretische
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More informationab initio Electronic Structure Calculations
ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab
More informationSearching. Sorting. Lambdas
.. s Babes-Bolyai University arthur@cs.ubbcluj.ro Overview 1 2 3 Feedback for the course You can write feedback at academicinfo.ubbcluj.ro It is both important as well as anonymous Write both what you
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture Interconnection Network Performance Performance Analysis of Interconnection Networks Bandwidth Latency Proportional to diameter Latency with contention Processor
More informationCost-Constrained Matchings and Disjoint Paths
Cost-Constrained Matchings and Disjoint Paths Kenneth A. Berman 1 Department of ECE & Computer Science University of Cincinnati, Cincinnati, OH Abstract Let G = (V, E) be a graph, where the edges are weighted
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationA Starvation-free Algorithm For Achieving 100% Throughput in an Input- Queued Switch
A Starvation-free Algorithm For Achieving 00% Throughput in an Input- Queued Switch Abstract Adisak ekkittikul ick ckeown Department of Electrical Engineering Stanford University Stanford CA 9405-400 Tel
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?
More informationAnalyzing Contention and Backoff in Asynchronous Shared Memory
Analyzing Contention and Backoff in Asynchronous Shared Memory ABSTRACT Naama Ben-David Carnegie Mellon University nbendavi@cs.cmu.edu Randomized backoff protocols have long been used to reduce contention
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationEfficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism
Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Peter Krusche Department of Computer Science University of Warwick June 2006 Outline 1 Introduction Motivation The BSP
More informationQR Decomposition in a Multicore Environment
QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance
More informationExam Spring Embedded Systems. Prof. L. Thiele
Exam Spring 20 Embedded Systems Prof. L. Thiele NOTE: The given solution is only a proposal. For correctness, completeness, or understandability no responsibility is taken. Sommer 20 Eingebettete Systeme
More informationCS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More informationPerformance Evaluation of MPI on Weather and Hydrological Models
NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance
More informationOverlay networks maximizing throughput
Overlay networks maximizing throughput Olivier Beaumont, Lionel Eyraud-Dubois, Shailesh Kumar Agrawal Cepage team, LaBRI, Bordeaux, France IPDPS April 20, 2010 Outline Introduction 1 Introduction 2 Complexity
More informationDefinition: Alternating time and space Game Semantics: State of machine determines who
CMPSCI 601: Recall From Last Time Lecture 3 Definition: Alternating time and space Game Semantics: State of machine determines who controls, White wants it to accept, Black wants it to reject. White wins
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationCMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationParallelism and Machine Models
Parallelism and Machine Models Andrew D Smith University of New Brunswick, Fredericton Faculty of Computer Science Overview Part 1: The Parallel Computation Thesis Part 2: Parallelism of Arithmetic RAMs
More informationImpression Store: Compressive Sensing-based Storage for. Big Data Analytics
Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in
More informationData structures Exercise 1 solution. Question 1. Let s start by writing all the functions in big O notation:
Data structures Exercise 1 solution Question 1 Let s start by writing all the functions in big O notation: f 1 (n) = 2017 = O(1), f 2 (n) = 2 log 2 n = O(n 2 ), f 3 (n) = 2 n = O(2 n ), f 4 (n) = 1 = O
More information