EE/CSCI 451: Parallel and Distributed Computation
|
|
- Cory Hubbard
- 5 years ago
- Views:
Transcription
1 EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian University of Southern California 1
2 From last class PRAM Shared memory access Example PRAM algorithms BSP model Today Example PRAM Algorithms Prefix sum List ranking Finding maximum Sorting Outline 2
3 See for example References COMP 633: Parallel Computing PRAM Algorithms Siddhartha Chatterjee, Jan Prins, Fall 2009 Introduction to Parallel Computing, (Second Edition) Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Explict Mutli-Threading (XMT): A PRAM-On-Chip Vision Uzi Vishkin, University of Maryland 3
4 Prefix sum problem (also called parallel prefix or scan) Input: 1. an input sequence x = x #,, x & of elements for an set T 2. a binary associative operator : T T T. Output: 2 sequence s = s #,, s &, for 1 k n, s 2 = 34# Serial solution Analysis: Example 2: Prefix Sum s 5 = 0 (zero element) s 37# = s 3 x 37#, 0 i n T 9 n = O(n) W n = O(n) a b c = (a b) c x 3. 4
5 Application of Prefix Sum Carry Lookahead adder + x &A# y &A# x 5 y 5 binary numbers z & z &A# z 5 x 3 y 3 FA Sum z 3 c 3 Carry input c 3 = f(x 3A#, y 3A#,, x 5, y 5, c 5 ) Prefix computation 5
6 Prefix Sum (PRAM Solution) (1) Algorithm 3 SCAN(T) if n = 1 then s # x # return s endif forall i 1: n/2 pardo y 3 x K3A# x K3 enddo < z #,, z & K > SCAN(< y #,, y & K >, ) forall i 1: n do if even(i) then s 3 z 3 K elsif i = 1 then s # x # else endif enddo return s 6 s 3 z (3A#) K x 3 Recursive call n/2 values Combine
7 Prefix Sum (PRAM Solution) (2) x # x K x P x Q x R x S x T x U y # y K y P y Q z # z K z P z Q Recursive call using y Result of the call s # s K s P s Q s R s S s T s U 7
8 Prefix Sum (PRAM Solution) (3) Example x # x K x P x Q x R x S x T x U Input y i z i s i
9 Prefix Sum (PRAM Solution) (4) Theorem: Algorithm 3 correctly computes the prefix sum of sequence x with time complexity O log n and work complexity O n, n = 2 k. Proof (correctness): 1) Base case: k = 0, which is correct by line 2. 2) Assume the correctness for inputs of size 2 2, and consider an input of size 2 27#. z 3 = y # y 3 = x # x K3, and If i = 2j (line 10), then s 3 = z i = x # x Ki = x # x 3 ; If i = 2 (line 11), then s 3 = x # ; If i = 2j + 1 (line 12), then s 3 = z i x 3 = x # x 3A# x 3 ; 9
10 Prefix Sum (PRAM Solution) (5) Theorem: Algorithm 3 correctly computes the prefix sum of sequence x with time complexity O log n and work complexity O n, n = 2 k. Proof (resource bounds): T n = T & K + O(1) W n = W & K + O(n) T n = O log n and W n = O n 10
11 PRAM Processor index Each Processor knows its index For example: a variable " myindex " This can be used for accessing various locations in memory For example: Processor(i) accesses i no row of an array Or Processor (i, j) access (i, j) no entry of a two dimensional array 11
12 Example 3: List Ranking List ranking problem Input: a list of pointers x 0, x 1,, x n 1, x 0 =NULL (end of the list) Note: initially, ith processor does not know it is the ith from the end of the list Output: rank of the element denoted count i, 0 i n 1 (number of elements in the list from that element till the end of the list, including that element) Null i End of the list Output: count(i)
13 List Ranking PRAM Solution (1) Algorithm PJ Given x 0, x 1,, x(n 1) [ Pointers] n processors, the i no processor is responsible for the i no pointer x(i) 1. for i = 0 to n 1 do in parallel 2. count i 1 3. while there exists x i Null 4. for i = 0 to n 1 pardo 5. if x i Null then 6. count i count i + count x i 7. x i x x i 8. end if 9. end for All PEs execute synchronously Will repeat log K n times Pointer Jumping 13
14 List Ranking PRAM Solution (2) Example for n = 8 Count: Null Step 0 Count: Step 1 Count: Step 2 Count: Step 3 (log K 8 = 3) 14
15 List Ranking PRAM Solution (3) CREW PRAM Theorem: Algorithm PJ computes the rank of each element in O(log n) time. T & n = O(log n) This technique is also called Pointer Jumping 15
16 List Ranking PRAM Solution (4) Analysis (with p = n) Total parallel time: T ~ n = O log n Total work done: W n = O n log n All n PEs active over log n cycles 16
17 Example 4: Finding Maximum (1) Simple EREW algorithm Input: (Max1) a set of distinct numbers x 0, x 1,, x n 1 Output: the maximum of these numbers Use a binary tree 17
18 Example 4: Finding Maximum (2) (Max1) Given n numbers, x 0, x 1,, x(n 1) p = n processors 1. for i = 0 to log n 1 do 2. for j = 0 to n 1 do in parallel 3. if j%2 37# = 0 4. x j = max (x j, x j ) 5. endif 6. endfor 7. endfor x 0 x 1 x 2 x 3 Analysis For p = n, T n = O log n W n = O n log n Note: We do not need all n processors in each level (can make the algorithm work optimal) 18
19 1. Assign log n numbers to each processor 2. for i = 0 to p 1 do in parallel 3. p(i) = max of assigned numbers 4. endfor 5. for i = 0 to log p 1 do 6. for j = 0 to p 1 do in parallel 7. if j%2 37# = 0 //mod operation // 8. p j = max (p j, p j ) 9. endif 10. endfor 11. endfor Example 4: Finding Maximum (3) Work Optimal Max 1 Given n numbers, x 0, x 1,, x(n 1) p = n/ log n processors 19 1 n/ log n blocks log n Note: At the beginning of Line 5, there are p numbers left.
20 Example 4: Finding Maximum (4) Work Optimal Max 1 1 n/ log n blocks log n p = n/ log n T n = O(log n) W n = O(p log n) = O(n) Work optimal 20
21 i A Faster Algorithm (Max2) (1) An algorithm on the COMMON CRCW model Common CRCW: 1 processors can write into the same memory location in a step All values written into a location in a cycle should be the same n 2 processors O(1) time solution j 1, if x i x(j) Bigger i, j = Š 0, otherwise Note: If x i is the maximum element, then row(i) is all 1 s This is the only row with all 1 s. 21
22 A Faster Algorithm (Max2) (2) Max2 algorithm j 0 x(i) is not Max 1. for i = 0 to n 1 pardo 2. A i = 1 i 3. Each processor P(i, j) pardo 4. compute Bigger(i, j) n 2 ops in parallel 5. if Bigger i, j = 0 then 6. A i = 0 Concurrent write possible 7. end 8. Each processor P(i) pardo 9. if A i = 1, then output x(i) Exactly one output 10.end 22
23 A Faster Algorithm (Max2) (3) Example: x 0 = 2, x 1 = 1, x 2 = 4, x 3 = 5 A P(0,0) 1. for i = 0 to n 1 pardo 2. A i = 1 3. Each processor P(i, j) pardo 4. compute Bigger(i, j) i j 5. if Bigger i, j = 0 then 6. A i = 0 7. end 8. Each processor P(i) pardo 9. if A i = 1, then output x i 10.end
24 A Faster Algorithm (Max2) (4) Analysis T n = O 1 W n = O n K p = n K Common CRCW model 24
25 An Improved Algorithm (Max2 ) Max2 algorithm (less total work) n x 0, x 1, n blocks Size: n 1. Apply Max2 in parallel to get n output values 2. Apply Max2 for those n values O 1 time p = n K = n O 1 time p = n n K = n n In total, T n = O 1 W n = O n n 25
26 Input: a sequence of distinct numbers X = [x 0, x 1,, x n 1 ] Output: rearranged sequence in ascending order A solution: merge sort Serial Merge Sort (X) 1. if X == 1, return X 2. else Example 5: Sorting 3. {X # = Serial Merge Sort(x 0, x 1,, x & K 1 ) 4. X K = Serial Merge Sort(x & K, x & K 5. Merge(X #, X K ) } 6. end if + 1,, x n 1 ) 26
27 Serial Merge Sort Merge Analysis 0 ( & K 1) n 2 T n # n = 2 T # 2 T # 1 = 1 + O n n 1 T # n = O n log n 27
28 Parallel Merge Sort (X) 1. if X == 1, return X 2. else pardo 3. {X # = Parallel Merge Sort(x 0, x 1,, x & 1 ) K 4. X K = Parallel Merge Sort(x &, x & + 1,, x n 1 ) } K K 5. end pardo 6. Merge(X #, X K ) 7. end if Parallel Merge Sort Parallel? p = n? (lowest level of the tree) EREW model? 28
29 Parallel merge sort with serial merge T ~ n = T ~ T 1 = 1 n 2 + O(n) Analysis T ~ n = O n, p = n W n = 2 W n 2 + O(n) W 1 = 1 W n = O(n log n) Note: we can do parallel merge, i.e., we can merge two & elements sorted K sequence into a sorted sequence faster by using more processors. (Later bitonicmerge network) 29
30 PRAM algorithm: T ~ (n) Network Model (a realistic model) p processors PRAM to Real Machines Communication primitives Random access read Random access write Time in network model to perform the same algorithm = O(T ~ ž (n) T ~ (n))? T ~ ž (n) Loss in performance 30
31 PRAM Conclusion Synchronous execution model Simple model Analysis Thinking parallel Time # of Processors Total Work PRAM Pthreads? Shared variables? Asynchronous execution 31
32 Backup Slides 32
33 Another Algorithm (Max3) (1) Max3 (x) n = x If n = 1 Return x 1. for i = 0 to n 1 pardo 2. {m i = Max3 X i n,, i n + n 1 } 3. end for 4. Maximum = Max2(m) Max2 using n values O 1 time, W T n = T n + O 1 = O(loglog n) W n = nw n + O n = O(n loglog n) Note: Total number of processors used p = O(n) Recursive call n = O(n) 33
34 Another Algorithm (Max3) (2) Analysis Assume n = 2 K Tree with variable # of children at each level Root has 2 K = n children Each node at the i-th level has 2 K children for 0 i k 1 Each node at level k has two children 34
35 Another Algorithm (Max3) (3) Ex. n = 2 K = 16, k = 2 Level 0, # of children = 2 K = 4 Level 1, # of children = 2 K =2 Level 2, # of children = k=2 n = 16 35
36 Another Algorithm (Max3) (4) Summary T n = O(log log n) W n = O n log log n p = n Not Work Optimal 36
37 Work Optimal Algorithm (Max4) (1) n log log n x 0, x 1, Size: log log n Each block has loglog n elements, & ««& blocks Max4 algorithm 1. Use 1 processor per block to find max of each block. & ««& elements lefttotal # of processors = n/ log log n 2. Run Max3 on the resulting array of size & ««& 37
38 Work Optimal Algorithm (Max4) (2) Analysis 1. Use 1 processor per block to find the maximum of each block p = n/ log log n Time Work O(log log n) O(n) 2. Run Max3 on the resulting array of size Problem size is & ««& & ««&, so we have p = n/ log log n Time O log log & ««& = O(log log n) Work O & log log( & ) = O(n) ««& ««& 38
39 Work Optimal Algorithm (Max4) (3) To conclude: Summary T n = O(loglog n) W n = O(n) p = & ««& Note: For Max3 W n = O(n loglog n) 39
40 Accelerated Cascading (1) Fast, work optimal Faster, not work optimal Combine Fast, work optimal Max4 idea is called Accelerated Cascading Start with (work) optimal algorithm initially to reduce problem size Then use faster (work) suboptimal algorithm on smaller problem n input Fast Smaller problem Faster Solution 40
41 Accelerated Cascading (2) In Max4 Algorithm: Stage 1: (Fast, work optimal) Start with binary tree algorithm with input size n Each level up the tree, reduce input size by factor of 2 In log log log n levels, the input size reduced to & K ± = & ²³ ²³ & W n = O n Stage 2: (Faster, suboptimal) and T n = O(logloglogn) Then switch to faster algorithm with n µ = W n = O n loglogn = O n T n = O loglogn µ 41 = O(loglogn) & ²³ ²³ & processors
42 EREW model Max1 p = & «& Common CRCW model Max Finding Summary (1) Input: n distinct numbers T n = O log n 42 W n = O n p T n W n Max2 n K O 1 O n K Max2 n n O 1 O n n Max3 n O log log n O n log log n Max4 n loglog n O log log n O n
43 Assume n = 2 K Max3 Details (1) Tree with variable # of children at each level Root has 2 K = n children Each node at the i-th level has 2 K children for 0 i k 1 Each node at level k has two children 43
44 Ex. n = 2 K = 16, k = 2 Max3 Details (2) Root has 2 K = 2 K = 4 children Each node at the i-th level has 2 K children for 0 i k 1 Each node at level 2 has two children Level 0, # of children = 4 Level 1, # of children = 2 Level 2, # of children = 2 n = 16 44
45 Max3 Details (3) Some Properties The depth of the tree is k. Since n = 2 K, k = O(log log n) The number of nodes at the i-th level is 2 K AK, for 0 i < k (Can prove this by induction) 45
46 Bottom up, level by level Max3 Details (4) The Algorithm At every level, compute maximum of all the children of an internal node O(1) time algorithm (Max2) Time complexity = O(log log n) [the depth of the tree] 46
47 Max3 Details (5) Total Work Max2 (O(1) time algorithm) needs O(m K ) processors for m elements Node at i-th level: 2 K children Total work for each node at the i-th level: O(2 K ) K 2 K AK nodes at the i-th level Total work for the i-th level is: O(2 K ) K 2 K AK = O 2 K = O(n) For O(log log n) levels, the total work W n = O(n log log n). (suboptimal) Total # of processor p = n (amount of work at i-th level) 47
Lecture 6 September 21, 2016
ICS 643: Advanced Parallel Algorithms Fall 2016 Lecture 6 September 21, 2016 Prof. Nodari Sitchinava Scribe: Tiffany Eulalio 1 Overview In the last lecture, we wrote a non-recursive summation program and
More informationLecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing
Lecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan yanyh@cse.sc.edu http://cse.sc.edu/~yanyh 1 Topic
More informationR ij = 2. Using all of these facts together, you can solve problem number 9.
Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the
More informationCOMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions
COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity
More informationParallelism and Machine Models
Parallelism and Machine Models Andrew D Smith University of New Brunswick, Fredericton Faculty of Computer Science Overview Part 1: The Parallel Computation Thesis Part 2: Parallelism of Arithmetic RAMs
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationAll of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and
Efficient Parallel Algorithms for Template Matching Sanguthevar Rajasekaran Department of CISE, University of Florida Abstract. The parallel complexity of template matching has been well studied. In this
More informationOptimization Techniques for Parallel Code 1. Parallel programming models
Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of
More informationGraph-theoretic Problems
Graph-theoretic Problems Parallel algorithms for fundamental graph-theoretic problems: We already used a parallelization of dynamic programming to solve the all-pairs-shortest-path problem. Here we are
More informationAlgorithms PART II: Partitioning and Divide & Conquer. HPC Fall 2007 Prof. Robert van Engelen
Algorithms PART II: Partitioning and Divide & Conquer HPC Fall 2007 Prof. Robert van Engelen Overview Partitioning strategies Divide and conquer strategies Further reading HPC Fall 2007 2 Partitioning
More informationCOL 730: Parallel Programming
COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some
More informationArnaud Legrand, CNRS, University of Grenoble. November 1, LIG laboratory, Theoretical Parallel Computing. A.
Theoretical Arnaud Legrand, CNRS, University of Grenoble LIG laboratory, arnaudlegrand@imagfr November 1, 2009 1 / 63 Outline Theoretical 1 2 3 2 / 63 Outline Theoretical 1 2 3 3 / 63 Models for Computation
More informationCSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms
Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures
More informationAssignment 5: Solutions
Comp 21: Algorithms and Data Structures Assignment : Solutions 1. Heaps. (a) First we remove the minimum key 1 (which we know is located at the root of the heap). We then replace it by the key in the position
More informationPRAMs. M 1 M 2 M p. globaler Speicher
PRAMs A PRAM (parallel random access machine) consists of p many identical processors M,..., M p (RAMs). Processors can read from/write to a shared (global) memory. Processors work synchronously. M M 2
More informationTwo Fast Parallel GCD Algorithms of Many Integers. Sidi Mohamed SEDJELMACI
Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire d Informatique Paris Nord, France ISSAC 2017, Kaiserslautern, 24-28 July 2017 1 Motivations GCD of two integers: Used
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationParallelism in Computer Arithmetic: A Historical Perspective
Parallelism in Computer Arithmetic: A Historical Perspective 21s 2s 199s 198s 197s 196s 195s Behrooz Parhami Aug. 218 Parallelism in Computer Arithmetic Slide 1 University of California, Santa Barbara
More informationAlgorithms and Their Complexity
CSCE 222 Discrete Structures for Computing David Kebo Houngninou Algorithms and Their Complexity Chapter 3 Algorithm An algorithm is a finite sequence of steps that solves a problem. Computational complexity
More informationOutline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10
Merge Sort 1 / 10 Outline 1 Merging 2 Merge Sort 3 Complexity of Sorting 4 Merge Sort and Other Sorts 2 / 10 Merging Merge sort is based on a simple operation known as merging: combining two ordered arrays
More informationInsertion Sort. We take the first element: 34 is sorted
Insertion Sort Idea: by Ex Given the following sequence to be sorted 34 8 64 51 32 21 When the elements 1, p are sorted, then the next element, p+1 is inserted within these to the right place after some
More informationDefinition: Alternating time and space Game Semantics: State of machine determines who
CMPSCI 601: Recall From Last Time Lecture Definition: Alternating time and space Game Semantics: State of machine determines who controls, White wants it to accept, Black wants it to reject. White wins
More informationAdvanced Analysis of Algorithms - Midterm (Solutions)
Advanced Analysis of Algorithms - Midterm (Solutions) K. Subramani LCSEE, West Virginia University, Morgantown, WV {ksmani@csee.wvu.edu} 1 Problems 1. Solve the following recurrence using substitution:
More informationCS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More informationDefinition: Alternating time and space Game Semantics: State of machine determines who
CMPSCI 601: Recall From Last Time Lecture 3 Definition: Alternating time and space Game Semantics: State of machine determines who controls, White wants it to accept, Black wants it to reject. White wins
More informationChapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn
Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Priority Queue / Heap Stores (key,data) pairs (like dictionary) But, different set of operations: Initialize-Heap: creates new empty heap
More informationCSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions
CSE 502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions 1. Consider the following algorithm: for i := 1 to α n log e n do Pick a random j [1, n]; If a[j] = a[j + 1] or a[j] = a[j 1] then output:
More informationParallel Prefix Algorithms 1. A Secret to turning serial into parallel
Parallel Prefix Algorithms. A Secret to turning serial into parallel 2. Suppose you bump into a parallel algorithm that surprises you there is no way to parallelize this algorithm you say 3. Probably a
More informationCS361 Homework #3 Solutions
CS6 Homework # Solutions. Suppose I have a hash table with 5 locations. I would like to know how many items I can store in it before it becomes fairly likely that I have a collision, i.e., that two items
More informationAlgorithms. What is an algorithm?
What is an algorithm? What is an algorithm? Informally, an algorithm is a well-defined finite set of rules that specifies a sequential series of elementary operations to be applied to some data called
More informationLecture 22: Multithreaded Algorithms CSCI Algorithms I. Andrew Rosenberg
Lecture 22: Multithreaded Algorithms CSCI 700 - Algorithms I Andrew Rosenberg Last Time Open Addressing Hashing Today Multithreading Two Styles of Threading Shared Memory Every thread can access the same
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationBDD Based Upon Shannon Expansion
Boolean Function Manipulation OBDD and more BDD Based Upon Shannon Expansion Notations f(x, x 2,, x n ) - n-input function, x i = or f xi=b (x,, x n ) = f(x,,x i-,b,x i+,,x n ), b= or Shannon Expansion
More informationCS 151. Red Black Trees & Structural Induction. Thursday, November 1, 12
CS 151 Red Black Trees & Structural Induction 1 Announcements Majors fair tonight 4:30-6:30pm in the Root Room in Carnegie. Come and find out about the CS major, or some other major. Winter Term in CS
More informationSeparating the Power of EREW and CREW PRAMs with Small Communication Width*
information and computation 138, 8999 (1997) article no. IC97649 Separating the Power of EREW and CREW PRAMs with Small Communication Width* Paul Beame Department of Computer Science and Engineering, University
More informationAlgorithms for Collective Communication. Design and Analysis of Parallel Algorithms
Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all
More informationCSE 202: Design and Analysis of Algorithms Lecture 3
CSE 202: Design and Analysis of Algorithms Lecture 3 Instructor: Kamalika Chaudhuri Announcement Homework 1 out Due on Tue Jan 24 in class No late homeworks will be accepted Greedy Algorithms Direct argument
More information4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More informationCS325: Analysis of Algorithms, Fall Midterm
CS325: Analysis of Algorithms, Fall 2017 Midterm I don t know policy: you may write I don t know and nothing else to answer a question and receive 25 percent of the total points for that problem whereas
More informationCSE 613: Parallel Programming. Lecture 8 ( Analyzing Divide-and-Conquer Algorithms )
CSE 613: Parallel Programming Lecture 8 ( Analyzing Divide-and-Conquer Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2012 A Useful Recurrence Consider the following
More informationLecture 9. Greedy Algorithm
Lecture 9. Greedy Algorithm T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo choo@skku.edu Copyright 2000-2018
More informationDivide-Conquer-Glue Algorithms
Divide-Conquer-Glue Algorithms Mergesort and Counting Inversions Tyler Moore CSE 3353, SMU, Dallas, TX Lecture 10 Divide-and-conquer. Divide up problem into several subproblems. Solve each subproblem recursively.
More informationData selection. Lower complexity bound for sorting
Data selection. Lower complexity bound for sorting Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 12 1 Data selection: Quickselect 2 Lower complexity bound for sorting 3 The
More informationCS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More informationOptimal Deterministic Sorting on Parallel Processors and Parallel Memory Hierarchies
Optimal Deterministic Sorting on Parallel Processors and Parallel Memory ierarchies Mark. odine Intrinsity, Inc. 11612 Bee Caves Rd., Bldg. II Austin, TX 78738 U.S.A. Jeffrey Scott Vitter Department of
More informationModels: Amdahl s Law, PRAM, α-β Tal Ben-Nun
spcl.inf.ethz.ch @spcl_eth Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017 DPHPC Overview cache coherency memory models 2 Speedup An application
More informationSearch Algorithms. Analysis of Algorithms. John Reif, Ph.D. Prepared by
Search Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D. Search Algorithms a) Binary Search: average case b) Interpolation Search c) Unbounded Search (Advanced material) Readings Reading Selection:
More informationOverview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 9 Divide and Conquer Merge sort Counting Inversions Binary Search Exponentiation Solving Recurrences Recursion Tree Method Master Theorem Sofya Raskhodnikova S. Raskhodnikova;
More informationLoop Invariants and Binary Search. Chapter 4.4, 5.1
Loop Invariants and Binary Search Chapter 4.4, 5.1 Outline Iterative Algorithms, Assertions and Proofs of Correctness Binary Search: A Case Study Outline Iterative Algorithms, Assertions and Proofs of
More informationData structures Exercise 1 solution. Question 1. Let s start by writing all the functions in big O notation:
Data structures Exercise 1 solution Question 1 Let s start by writing all the functions in big O notation: f 1 (n) = 2017 = O(1), f 2 (n) = 2 log 2 n = O(n 2 ), f 3 (n) = 2 n = O(2 n ), f 4 (n) = 1 = O
More informationLecture 2: MergeSort. CS 341: Algorithms. Thu, Jan 10 th 2019
Lecture 2: MergeSort CS 341: Algorithms Thu, Jan 10 th 2019 Outline For Today 1. Example 1: Sorting-Merge Sort-Divide & Conquer 2 Sorting u Input: An array of integers in arbitrary order 10 2 37 5 9 55
More informationAnalysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort
Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort Xi Chen Columbia University We continue with two more asymptotic notation: o( ) and ω( ). Let f (n) and g(n) are functions that map
More informationCpt S 223. School of EECS, WSU
Algorithm Analysis 1 Purpose Why bother analyzing code; isn t getting it to work enough? Estimate time and memory in the average case and worst case Identify bottlenecks, i.e., where to reduce time Compare
More informationAdvanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen
Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences
More informationCOMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21
COMP 250: Quicksort Carlos G. Oliver February 13, 2018 Carlos G. Oliver COMP 250: Quicksort February 13, 2018 1 / 21 1 1 https://xkcd.com/1667/ Carlos G. Oliver COMP 250: Quicksort February 13, 2018 2
More informationChapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],
Chapter 1 Comparison-Sorting and Selecting in Totally Monotone Matrices Noga Alon Yossi Azar y Abstract An mn matrix A is called totally monotone if for all i 1 < i 2 and j 1 < j 2, A[i 1; j 1] > A[i 1;
More informationCSE 202 Homework 4 Matthias Springer, A
CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and
More informationLecture 2: Divide and conquer and Dynamic programming
Chapter 2 Lecture 2: Divide and conquer and Dynamic programming 2.1 Divide and Conquer Idea: - divide the problem into subproblems in linear time - solve subproblems recursively - combine the results in
More informationAlgorithm efficiency analysis
Algorithm efficiency analysis Mădălina Răschip, Cristian Gaţu Faculty of Computer Science Alexandru Ioan Cuza University of Iaşi, Romania DS 2017/2018 Content Algorithm efficiency analysis Recursive function
More informationDivide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14
Divide and Conquer Algorithms CSE 101: Design and Analysis of Algorithms Lecture 14 CSE 101: Design and analysis of algorithms Divide and conquer algorithms Reading: Sections 2.3 and 2.4 Homework 6 will
More informationCSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )
CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance
More informationTopic 17. Analysis of Algorithms
Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm
More informationProblem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26
Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing
More informationParallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems
Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems ANSHUL GUPTA IBM T.J.WATSON RESEARCH CENTER YORKTOWN HEIGHTS, NY 109 ANSHUL@WATSON.IBM.COM VIPIN KUMAR
More informationSorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM
Sorting Chapter 11-1 - Sorting Ø We have seen the advantage of sorted data representations for a number of applications q Sparse vectors q Maps q Dictionaries Ø Here we consider the problem of how to efficiently
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationFundamental Algorithms
Fundamental Algorithms Chapter 5: Searching Michael Bader Winter 2014/15 Chapter 5: Searching, Winter 2014/15 1 Searching Definition (Search Problem) Input: a sequence or set A of n elements (objects)
More informationTestability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.
Testability Lecture 6: Logic Simulation Shaahin Hessabi Department of Computer Engineering Sharif University of Technology Adapted from the presentation prepared by book authors Slide 1 of 27 Outline What
More informationBinary Search Trees. Lecture 29 Section Robb T. Koether. Hampden-Sydney College. Fri, Apr 8, 2016
Binary Search Trees Lecture 29 Section 19.2 Robb T. Koether Hampden-Sydney College Fri, Apr 8, 2016 Robb T. Koether (Hampden-Sydney College) Binary Search Trees Fri, Apr 8, 2016 1 / 40 1 Binary Search
More informationChapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 5 Divide and Conquer CLRS 4.3 Slides by Kevin Wayne. Copyright 25 Pearson-Addison Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve
More informationComplexity: Some examples
Algorithms and Architectures III: Distributed Systems H-P Schwefel, Jens M. Pedersen Mm6 Distributed storage and access (jmp) Mm7 Introduction to security aspects (hps) Mm8 Parallel complexity (hps) Mm9
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While
More informationAlgorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee
Algorithm Analysis Divide and Conquer Chung-Ang University, Jaesung Lee Introduction 2 Divide and Conquer Paradigm 3 Advantages of Divide and Conquer Solving Difficult Problems Algorithm Efficiency Parallelism
More information14:332:231 DIGITAL LOGIC DESIGN
4:332:23 DIGITAL LOGIC DEIGN Ivan Marsic, Rutgers University Electrical & Computer Engineering Fall 23 Lecture #4: Adders, ubtracters, and ALUs Vector Binary Adder [Wakerly 4 th Ed., ec. 6., p. 474] ingle
More informationCOMP 250 Fall Midterm examination
COMP 250 Fall 2004 - Midterm examination October 18th 2003, 13:35-14:25 1 Running time analysis (20 points) For each algorithm below, indicate the running time using the simplest and most accurate big-oh
More informationOptimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs
Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Krishnendu Chatterjee Rasmus Ibsen-Jensen Andreas Pavlogiannis IST Austria Abstract. We consider graphs with n nodes together
More informationEECS150 - Digital Design Lecture 25 Shifters and Counters. Recap
EECS150 - Digital Design Lecture 25 Shifters and Counters Nov. 21, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John
More informationINF4130: Dynamic Programming September 2, 2014 DRAFT version
INF4130: Dynamic Programming September 2, 2014 DRAFT version In the textbook: Ch. 9, and Section 20.5 Chapter 9 can also be found at the home page for INF4130 These slides were originally made by Petter
More informationOutline. 1 Introduction. Merging and MergeSort. 3 Analysis. 4 Reference
Outline Computer Science 331 Sort Mike Jacobson Department of Computer Science University of Calgary Lecture #25 1 Introduction 2 Merging and 3 4 Reference Mike Jacobson (University of Calgary) Computer
More information6. DYNAMIC PROGRAMMING II
6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford algorithm distance vector protocols negative cycles in a digraph Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison
More informationECE 448 Lecture 6. Finite State Machines. State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code. George Mason University
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code George Mason University Required reading P. Chu, FPGA Prototyping by VHDL Examples
More informationBreaking a Time-and-Space Barrier in Constructing Full-Text Indices
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices Wing-Kai Hon Kunihiko Sadakane Wing-Kin Sung Abstract Suffix trees and suffix arrays are the most prominent full-text indices, and their
More informationISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,
A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,
More informationSkip List. CS 561, Lecture 11. Skip List. Skip List
Skip List CS 561, Lecture 11 Jared Saia University of New Mexico Technically, not a BST, but they implement all of the same operations Very elegant randomized data structure, simple to code but analysis
More information1 Lecture 8: Interpolating polynomials.
1 Lecture 8: Interpolating polynomials. 1.1 Horner s method Before turning to the main idea of this part of the course, we consider how to evaluate a polynomial. Recall that a polynomial is an expression
More informationData Structures and Algorithms " Search Trees!!
Data Structures and Algorithms " Search Trees!! Outline" Binary Search Trees! AVL Trees! (2,4) Trees! 2 Binary Search Trees! "! < 6 2 > 1 4 = 8 9 Ordered Dictionaries" Keys are assumed to come from a total
More informationData Structures and Algorithms
Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length
More informationCSE 613: Parallel Programming. Lectures ( Analyzing Divide-and-Conquer Algorithms )
CSE 613: Parallel Programming Lectures 13 14 ( Analyzing Divide-and-Conquer Algorithms ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2015 A Useful Recurrence Consider the
More informationSorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort
Sorting Algorithms We have already seen: Selection-sort Insertion-sort Heap-sort We will see: Bubble-sort Merge-sort Quick-sort We will show that: O(n log n) is optimal for comparison based sorting. Bubble-Sort
More information5. DIVIDE AND CONQUER I
5. DIVIDE AND CONQUER I mergesort counting inversions closest pair of points median and selection Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos
More informationCSED233: Data Structures (2017F) Lecture4: Analysis of Algorithms
(2017F) Lecture4: Analysis of Algorithms Daijin Kim CSE, POSTECH dkim@postech.ac.kr Running Time Most algorithms transform input objects into output objects. The running time of an algorithm typically
More informationIntroduction. An Introduction to Algorithms and Data Structures
Introduction An Introduction to Algorithms and Data Structures Overview Aims This course is an introduction to the design, analysis and wide variety of algorithms (a topic often called Algorithmics ).
More information1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.
CSC 6 H5S Homework Assignment # 1 Spring 010 Worth: 11% Due: Monday February 1 (at class) For each question, please write up detailed answers carefully. Make sure that you use notation and terminology
More informationEECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary
EECS50 - Digital Design Lecture - Shifters & Counters February 24, 2003 John Wawrzynek Spring 2005 EECS50 - Lec-counters Page Register Summary All registers (this semester) based on Flip-flops: q 3 q 2
More informationCS 4407 Algorithms Lecture 2: Growth Functions
CS 4407 Algorithms Lecture 2: Growth Functions Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline Growth Functions Mathematical specification of growth functions
More informationFast Sorting and Selection. A Lower Bound for Worst Case
Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 0 Fast Sorting and Selection USGS NEIC. Public domain government image. A Lower Bound
More informationLogic BIST. Sungho Kang Yonsei University
Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern
More informationDynamic Programming. Data Structures and Algorithms Andrei Bulatov
Dynamic Programming Data Structures and Algorithms Andrei Bulatov Algorithms Dynamic Programming 18-2 Weighted Interval Scheduling Weighted interval scheduling problem. Instance A set of n jobs. Job j
More information