Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms

Size: px
Start display at page:

Download "Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms"

Transcription

1 Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms Jason Ansel Yee Lok Wong Cy Chan Marek Olszewski Alan Edelman Saman Amarasinghe MIT - CSAIL April 4, 2011 Jason Ansel (MIT) PetaBricks April 4, / 30

2 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Autotuner 5 Results 6 Conclusions Jason Ansel (MIT) PetaBricks April 4, / 30

3 A motivating example How would you write a fast sorting algorithm? Jason Ansel (MIT) PetaBricks April 4, / 30

4 A motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Jason Ansel (MIT) PetaBricks April 4, / 30

5 A motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Jason Ansel (MIT) PetaBricks April 4, / 30

6 A motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Poly-algorithms Jason Ansel (MIT) PetaBricks April 4, / 30

7 std::stable sort /usr/include/c++/4.5.2/bits/stl algo.h lines Jason Ansel (MIT) PetaBricks April 4, / 30

8 std::stable sort /usr/include/c++/4.5.2/bits/stl algo.h lines Jason Ansel (MIT) PetaBricks April 4, / 30

9 std::sort /usr/include/c++/4.5.2/bits/stl algo.h lines Jason Ansel (MIT) PetaBricks April 4, / 30

10 std::sort /usr/include/c++/4.5.2/bits/stl algo.h lines Why 16? Why 15? Jason Ansel (MIT) PetaBricks April 4, / 30

11 std::sort /usr/include/c++/4.5.2/bits/stl algo.h lines Why 16? Why 15? Dates back to at least 2000 (Jun 2000 SGI release) Still in current C++ STL shipped with GCC 10+ years of of S threshold = 16 Jason Ansel (MIT) PetaBricks April 4, / 30

12 Is 15 the right number? The best cutoff (CO) changes Depends on competing costs: Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality) Jason Ansel (MIT) PetaBricks April 4, / 30

13 Is 15 the right number? The best cutoff (CO) changes Depends on competing costs: Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality) Sorting doubles with std::stable sort: CO 200 optimal on a Phenom 905e (15% speedup over CO = 15) CO 400 optimal on a Opteron 6168 (15% speedup over CO = 15) CO 500 optimal on a Xeon E5320 (34% speedup over CO = 15) CO 700 optimal on a Xeon X5460 (25% speedup over CO = 15) Compiler s hands are tied, it is stuck with 15 Jason Ansel (MIT) PetaBricks April 4, / 30

14 Back to our motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Poly-algorithms Jason Ansel (MIT) PetaBricks April 4, / 30

15 Back to our motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT) PetaBricks April 4, / 30

16 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort Jason Ansel (MIT) PetaBricks April 4, / 30

17 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort On a Sun Fire T200 Niagara (8 cores) 1 16-way merge sort below way merge sort below way merge sort below way parallel merge sort Jason Ansel (MIT) PetaBricks April 4, / 30

18 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort On a Sun Fire T200 Niagara (8 cores) 1 16-way merge sort below way merge sort below way merge sort below way parallel merge sort 235% slowdown running Niagara algorithm on the Xeon 8% slowdown running Xeon algorithm on the Niagara Jason Ansel (MIT) PetaBricks April 4, / 30

19 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort On a Sun Fire T200 Niagara (8 cores) 1 16-way merge sort below way merge sort below way merge sort below way parallel merge sort 235% slowdown running Niagara algorithm on the Xeon 8% slowdown running Xeon algorithm on the Niagara Need a way to express these algorithmic choices to enable autotuning Jason Ansel (MIT) PetaBricks April 4, / 30

20 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Autotuner 5 Results 6 Conclusions Jason Ansel (MIT) PetaBricks April 4, / 30

21 Algorithmic choices Language e i t h e r { I n s e r t i o n S o r t ( out, i n ) ; } or { QuickSort ( out, i n ) ; } or { MergeSort ( out, i n ) ; } or { R a d i x S o r t ( out, i n ) ; } Jason Ansel (MIT) PetaBricks April 4, / 30

22 Algorithmic choices Language e i t h e r { I n s e r t i o n S o r t ( out, i n ) ; } or { QuickSort ( out, i n ) ; } or { MergeSort ( out, i n ) ; } or { R a d i x S o r t ( out, i n ) ; } Representation Decision tree synthesized by our evolutionary algorithm (EA) Jason Ansel (MIT) PetaBricks April 4, / 30

23 The PetaBricks language Choices expressed in the language High level algorithmic choices Dependency-based synthesized outer control flow Parallelization strategy Programs automatically adapt to their environment Tuned using our bottom-up evaluation algorithm Offline autotuner or always-on online autotuner Jason Ansel (MIT) PetaBricks April 4, / 30

24 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Autotuner 5 Results 6 Conclusions Jason Ansel (MIT) PetaBricks April 4, / 30

25 Variable accuracy algorithms Many problems don t have a single correct answer Jason Ansel (MIT) PetaBricks April 4, / 30

26 Variable accuracy algorithms Many problems don t have a single correct answer Soft computing Approximation algorithms for NP-hard problems DSP algorithms Different grid resolutions Data precisions Iterative algorithms Choosing convergence criteria Jason Ansel (MIT) PetaBricks April 4, / 30

27 Variable accuracy example Example... f o r ( i n t i = 0 ; i < 100; ++i ) { S O R I t e r a t i o n ( tmp ) ; }... Jason Ansel (MIT) PetaBricks April 4, / 30

28 Variable accuracy example Example... f o r ( i n t i = 0 ; i < 100; ++i ) { S O R I t e r a t i o n ( tmp ) ; }... Competing objectives of performance and accuracy Must maximize performance while meeting accuracy targets Jason Ansel (MIT) PetaBricks April 4, / 30

29 Accuracy metrics and for enough loops Language a c c u r a c y m e t r i c MyRMSError Jason Ansel (MIT) PetaBricks April 4, / 30

30 Accuracy metrics and for enough loops Language a c c u r a c y m e t r i c... f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } MyRMSError Jason Ansel (MIT) PetaBricks April 4, / 30

31 Accuracy metrics and for enough loops Language a c c u r a c y m e t r i c... f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } MyRMSError Representation Function from problem size to number of iterations synthesized by our EA Jason Ansel (MIT) PetaBricks April 4, / 30

32 Accuracy variables Language a c c u r a c y m e t r i c MyRMSError a c c u r a c y v a r i a b l e k... f o r ( i n t i =0; i <k ; ++i ) { S O R I t e r a t i o n ( tmp ) ; } Jason Ansel (MIT) PetaBricks April 4, / 30

33 Accuracy variables Language a c c u r a c y m e t r i c MyRMSError a c c u r a c y v a r i a b l e k... f o r ( i n t i =0; i <k ; ++i ) { S O R I t e r a t i o n ( tmp ) ; } Representation Function from problem size to k synthesized by our EA Jason Ansel (MIT) PetaBricks April 4, / 30

34 Variable accuracy and algorithmic choices Language a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Jason Ansel (MIT) PetaBricks April 4, / 30

35 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Autotuner 5 Results 6 Conclusions Jason Ansel (MIT) PetaBricks April 4, / 30

36 Traditional evolution algorithm Initial population???? Cost = 0 Jason Ansel (MIT) PetaBricks April 4, / 30

37 Traditional evolution algorithm Initial population 72.7s??? Cost = 72.7 Jason Ansel (MIT) PetaBricks April 4, / 30

38 Traditional evolution algorithm Initial population 72.7s 10.5s?? Cost = 83.2 Jason Ansel (MIT) PetaBricks April 4, / 30

39 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s? Cost = 87.3 Jason Ansel (MIT) PetaBricks April 4, / 30

40 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Jason Ansel (MIT) PetaBricks April 4, / 30

41 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2???? Cost = 0 Jason Ansel (MIT) PetaBricks April 4, / 30

42 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Jason Ansel (MIT) PetaBricks April 4, / 30

43 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3???? Cost = 0 Jason Ansel (MIT) PetaBricks April 4, / 30

44 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Jason Ansel (MIT) PetaBricks April 4, / 30

45 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4???? Cost = 0 Jason Ansel (MIT) PetaBricks April 4, / 30

46 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Jason Ansel (MIT) PetaBricks April 4, / 30

47 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Cost of autotuning front-loaded in initial (unfit) population We could speed up tuning if we start with a faster initial population Jason Ansel (MIT) PetaBricks April 4, / 30

48 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Cost of autotuning front-loaded in initial (unfit) population We could speed up tuning if we start with a faster initial population Key insight Smaller input sizes can be used to form better initial population Jason Ansel (MIT) PetaBricks April 4, / 30

49 Bottom-up evolutionary algorithm Train on input size 64 Jason Ansel (MIT) PetaBricks April 4, / 30

50 Bottom-up evolutionary algorithm Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks April 4, / 30

51 Bottom-up evolutionary algorithm Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks April 4, / 30

52 Bottom-up evolutionary algorithm Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks April 4, / 30

53 Bottom-up evolutionary algorithm Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks April 4, / 30

54 Bottom-up evolutionary algorithm Train on input size 1, to form initial population for: Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks April 4, / 30

55 Bottom-up evolutionary algorithm Train on input size 1, to form initial population for: Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Naturally exploits optimal substructure of problems Jason Ansel (MIT) PetaBricks April 4, / 30

56 Variable accuracy autotuner Jason Ansel (MIT) PetaBricks April 4, / 30

57 Variable accuracy autotuner Jason Ansel (MIT) PetaBricks April 4, / 30

58 Variable accuracy autotuner Partition accuracy space into discrete levels Jason Ansel (MIT) PetaBricks April 4, / 30

59 Variable accuracy autotuner Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks April 4, / 30

60 Variable accuracy autotuner Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks April 4, / 30

61 Variable accuracy autotuner Generation i Generation i + 1 Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks April 4, / 30

62 Variable accuracy autotuner Generation i Generation i + 1 Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks April 4, / 30

63 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Autotuner 5 Results 6 Conclusions Jason Ansel (MIT) PetaBricks April 4, / 30

64 Changing accuracy requirements Speedup (x) Accuracy Level 2.0 Accuracy Level 1.5 Accuracy Level 1.0 Accuracy Level 0.8 Accuracy Level 0.6 Accuracy Level Input Size Image Compression Jason Ansel (MIT) PetaBricks April 4, / 30

65 Changing accuracy requirements Speedup (x) Accuracy Level 0.95 Accuracy Level 0.75 Accuracy Level 0.50 Accuracy Level 0.20 Accuracy Level 0.10 Accuracy Level 0.05 Speedup (x) Accuracy Level 1.01 Accuracy Level 1.1 Accuracy Level 1.2 Accuracy Level 1.3 Accuracy Level 1.4 Speedup (x) Accuracy Level 2.0 Accuracy Level 1.5 Accuracy Level 1.0 Accuracy Level 0.8 Accuracy Level 0.6 Accuracy Level Input Size Clustering e+06 Input Size Bin Packing Input Size Image Compression Speedup (x) Accuracy Level 10 9 Accuracy Level 10 7 Accuracy Level 10 5 Accuracy Level 10 3 Accuracy Level 10 1 Speedup (x) Accuracy Level 10 9 Accuracy Level 10 7 Accuracy Level 10 5 Accuracy Level 10 3 Accuracy Level 10 1 Speedup (x) Accuracy Level 3.0 Accuracy Level 2.0 Accuracy Level 1.5 Accuracy Level 1.0 Accuracy Level 0.5 Accuracy Level e Input Size 3D Helmholtz Input Size 2D Poisson Input Size Preconditioner Jason Ansel (MIT) PetaBricks April 4, / 30

66 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Jason Ansel (MIT) PetaBricks April 4, / 30

67 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Time SOR Ite ra tio n Jason Ansel (MIT) PetaBricks April 4, / 30

68 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Grid Size Time SOR Iteration Jason Ansel (MIT) PetaBricks April 4, / 30

69 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Grid Size Time SOR Iteration Jason Ansel (MIT) PetaBricks April 4, / 30

70 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Grid Size Time SOR Iteration Direct Solve Jason Ansel (MIT) PetaBricks April 4, / 30

71 Autotuned V-cycle shapes Grid Size Jason Ansel (MIT) PetaBricks April 4, / 30

72 Autotuned V-cycle shapes Grid Size Jason Ansel (MIT) PetaBricks April 4, / 30

73 Autotuned V-cycle shapes Grid Size Jason Ansel (MIT) PetaBricks April 4, / 30

74 Autotuned V-cycle shapes Grid Size Grid Size Jason Ansel (MIT) PetaBricks April 4, / 30

75 Autotuned bin packing algorithms Jason Ansel (MIT) PetaBricks April 4, / 30

76 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Autotuner 5 Results 6 Conclusions Jason Ansel (MIT) PetaBricks April 4, / 30

77 Conclusions Motivating goal of PetaBricks Make programs future-proof by allowing them to adapt to their environment. We can do better than hard coded constants! Jason Ansel (MIT) PetaBricks April 4, / 30

78 Thanks! Questions? Jason Ansel (MIT) PetaBricks April 4, / 30

PetaBricks: Variable Accuracy and Online Learning

PetaBricks: Variable Accuracy and Online Learning PetaBricks: Variable Accuracy and Online Learning Jason Ansel MIT - CSAIL May 4, 2011 Jason Ansel (MIT) PetaBricks May 4, 2011 1 / 40 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable

More information

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems Jason Ansel Maciej Pacula Saman Amarasinghe Una-May O Reilly MIT - CSAIL July 14, 2011 Jason Ansel (MIT) PetaBricks July

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length

More information

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk CMPS 2200 Fall 2017 Sorting Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 11/17/17 CMPS 2200 Intro. to Algorithms 1 How fast can we sort? All the sorting algorithms

More information

Insertion Sort. We take the first element: 34 is sorted

Insertion Sort. We take the first element: 34 is sorted Insertion Sort Idea: by Ex Given the following sequence to be sorted 34 8 64 51 32 21 When the elements 1, p are sorted, then the next element, p+1 is inserted within these to the right place after some

More information

Linear Selection and Linear Sorting

Linear Selection and Linear Sorting Analysis of Algorithms Linear Selection and Linear Sorting If more than one question appears correct, choose the more specific answer, unless otherwise instructed. Concept: linear selection 1. Suppose

More information

Algorithms. Algorithms 2.2 MERGESORT. mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 2.2 MERGESORT. mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE Algorithms ROBERT SEDGEWICK KEVIN WAYNE 2.2 MERGESORT Algorithms F O U R T H E D I T I O N mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu

More information

Sorting algorithms. Sorting algorithms

Sorting algorithms. Sorting algorithms Properties of sorting algorithms A sorting algorithm is Comparison based If it works by pairwise key comparisons. In place If only a constant number of elements of the input array are ever stored outside

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort Sorting Algorithms We have already seen: Selection-sort Insertion-sort Heap-sort We will see: Bubble-sort Merge-sort Quick-sort We will show that: O(n log n) is optimal for comparison based sorting. Bubble-Sort

More information

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM Sorting Chapter 11-1 - Sorting Ø We have seen the advantage of sorted data representations for a number of applications q Sparse vectors q Maps q Dictionaries Ø Here we consider the problem of how to efficiently

More information

Sorting. CS 3343 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk

Sorting. CS 3343 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 3343 Fall 2010 Sorting Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 10/7/10 CS 3343 Analysis of Algorithms 1 How fast can we sort? All the sorting algorithms we

More information

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February) Algorithms and Data Structures 016 Week 5 solutions (Tues 9th - Fri 1th February) 1. Draw the decision tree (under the assumption of all-distinct inputs) Quicksort for n = 3. answer: (of course you should

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Lecture 1: Asymptotics, Recurrences, Elementary Sorting Lecture 1: Asymptotics, Recurrences, Elementary Sorting Instructor: Outline 1 Introduction to Asymptotic Analysis Rate of growth of functions Comparing and bounding functions: O, Θ, Ω Specifying running

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Sorting and Searching. Tony Wong

Sorting and Searching. Tony Wong Tony Wong 2017-03-04 Sorting Sorting is the reordering of array elements into a specific order (usually ascending) For example, array a[0..8] consists of the final exam scores of the n = 9 students in

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC2620 Introduction to Data Structures Lecture 6a Complexity Analysis Recursive Algorithms Complexity Analysis Determine how the processing time of an algorithm grows with input size What if the algorithm

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 2: Sorting, Winter 2018/19 1 Fundamental Algorithms Chapter 2: Sorting Jan Křetínský Winter 2018/19 Chapter 2: Sorting, Winter 2018/19 2 Part I Simple Sorts Chapter 2: Sorting, Winter 2018/19 3

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 2: Sorting Harald Räcke Winter 2015/16 Chapter 2: Sorting, Winter 2015/16 1 Part I Simple Sorts Chapter 2: Sorting, Winter 2015/16 2 The Sorting Problem Definition Sorting

More information

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 5 Divide and Conquer CLRS 4.3 Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve

More information

Central Algorithmic Techniques. Iterative Algorithms

Central Algorithmic Techniques. Iterative Algorithms Central Algorithmic Techniques Iterative Algorithms Code Representation of an Algorithm class InsertionSortAlgorithm extends SortAlgorithm { void sort(int a[]) throws Exception { for (int i = 1; i < a.length;

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators

Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators PARIS-SACLAY, FRANCE Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators Serge G. Petiton a and Langshi Chen b April the 6 th, 2016 a Université de Lille, Sciences et Technologies

More information

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101 Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109 Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101 http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ March 30, 2016 Sorting (or Ordering)

More information

Data selection. Lower complexity bound for sorting

Data selection. Lower complexity bound for sorting Data selection. Lower complexity bound for sorting Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 12 1 Data selection: Quickselect 2 Lower complexity bound for sorting 3 The

More information

A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true. In RAM computation model, RAM stands for Random Access Model.

A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true. In RAM computation model, RAM stands for Random Access Model. In analysis the upper bound means the function grows asymptotically no faster than its largest term. 1 true A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true In RAM computation model,

More information

Fast Sorting and Selection. A Lower Bound for Worst Case

Fast Sorting and Selection. A Lower Bound for Worst Case Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 0 Fast Sorting and Selection USGS NEIC. Public domain government image. A Lower Bound

More information

COL 730: Parallel Programming

COL 730: Parallel Programming COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some

More information

Sorting DS 2017/2018

Sorting DS 2017/2018 Sorting DS 2017/2018 Content Sorting based on comparisons Bubble sort Insertion sort Selection sort Merge sort Quick sort Counting sort Distribution sort FII, UAIC Lecture 8 DS 2017/2018 2 / 44 The sorting

More information

Bin Sort. Sorting integers in Range [1,...,n] Add all elements to table and then

Bin Sort. Sorting integers in Range [1,...,n] Add all elements to table and then Sorting1 Bin Sort Sorting integers in Range [1,...,n] Add all elements to table and then Retrieve in order 1,2,3,...,n Stable Sorting Method (repeated elements will end up in their original order) Numbers

More information

! Insert. ! Remove largest. ! Copy. ! Create. ! Destroy. ! Test if empty. ! Fraud detection: isolate $$ transactions.

! Insert. ! Remove largest. ! Copy. ! Create. ! Destroy. ! Test if empty. ! Fraud detection: isolate $$ transactions. Priority Queues Priority Queues Data. Items that can be compared. Basic operations.! Insert.! Remove largest. defining ops! Copy.! Create.! Destroy.! Test if empty. generic ops Reference: Chapter 6, Algorithms

More information

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14 Divide and Conquer Algorithms CSE 101: Design and Analysis of Algorithms Lecture 14 CSE 101: Design and analysis of algorithms Divide and conquer algorithms Reading: Sections 2.3 and 2.4 Homework 6 will

More information

Longest Common Prefixes

Longest Common Prefixes Longest Common Prefixes The standard ordering for strings is the lexicographical order. It is induced by an order over the alphabet. We will use the same symbols (,

More information

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern

More information

Lecture 4. Quicksort

Lecture 4. Quicksort Lecture 4. Quicksort T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo choo@skku.edu Copyright 2000-2018 Networking

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Space Complexity of Algorithms

Space Complexity of Algorithms Space Complexity of Algorithms So far we have considered only the time necessary for a computation Sometimes the size of the memory necessary for the computation is more critical. The amount of memory

More information

NEW RESEARCH on THEORY and PRACTICE of SORTING and SEARCHING. R. Sedgewick Princeton University. J. Bentley Bell Laboratories

NEW RESEARCH on THEORY and PRACTICE of SORTING and SEARCHING. R. Sedgewick Princeton University. J. Bentley Bell Laboratories NEW RESEARCH on THEORY and PRACTICE of SORTING and SEARCHING R. Sedgewick Princeton University J. Bentley Bell Laboratories Context Layers of abstraction in modern computing Applications Programming Environment

More information

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Robert B. Gramacy University of Chicago Booth School of Business faculty.chicagobooth.edu/robert.gramacy

More information

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 MA008 p.1/37 MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 Dr. Markus Hagenbuchner markus@uow.edu.au. MA008 p.2/37 Exercise 1 (from LN 2) Asymptotic Notation When constants appear in exponents

More information

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10

Outline. 1 Merging. 2 Merge Sort. 3 Complexity of Sorting. 4 Merge Sort and Other Sorts 2 / 10 Merge Sort 1 / 10 Outline 1 Merging 2 Merge Sort 3 Complexity of Sorting 4 Merge Sort and Other Sorts 2 / 10 Merging Merge sort is based on a simple operation known as merging: combining two ordered arrays

More information

Algorithms and Their Complexity

Algorithms and Their Complexity CSCE 222 Discrete Structures for Computing David Kebo Houngninou Algorithms and Their Complexity Chapter 3 Algorithm An algorithm is a finite sequence of steps that solves a problem. Computational complexity

More information

Lecture 2: MergeSort. CS 341: Algorithms. Thu, Jan 10 th 2019

Lecture 2: MergeSort. CS 341: Algorithms. Thu, Jan 10 th 2019 Lecture 2: MergeSort CS 341: Algorithms Thu, Jan 10 th 2019 Outline For Today 1. Example 1: Sorting-Merge Sort-Divide & Conquer 2 Sorting u Input: An array of integers in arbitrary order 10 2 37 5 9 55

More information

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2 Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula

More information

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula

More information

CSC 8301 Design & Analysis of Algorithms: Lower Bounds

CSC 8301 Design & Analysis of Algorithms: Lower Bounds CSC 8301 Design & Analysis of Algorithms: Lower Bounds Professor Henry Carter Fall 2016 Recap Iterative improvement algorithms take a feasible solution and iteratively improve it until optimized Simplex

More information

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 K-Lists Anindya Sen and Corrie Scalisi University of California, Santa Cruz June 11, 2007 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic

More information

W vs. QCD Jet Tagging at the Large Hadron Collider

W vs. QCD Jet Tagging at the Large Hadron Collider W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

Divide-Conquer-Glue Algorithms

Divide-Conquer-Glue Algorithms Divide-Conquer-Glue Algorithms Mergesort and Counting Inversions Tyler Moore CSE 3353, SMU, Dallas, TX Lecture 10 Divide-and-conquer. Divide up problem into several subproblems. Solve each subproblem recursively.

More information

Searching. Sorting. Lambdas

Searching. Sorting. Lambdas .. s Babes-Bolyai University arthur@cs.ubbcluj.ro Overview 1 2 3 Feedback for the course You can write feedback at academicinfo.ubbcluj.ro It is both important as well as anonymous Write both what you

More information

University of New Mexico Department of Computer Science. Midterm Examination. CS 361 Data Structures and Algorithms Spring, 2003

University of New Mexico Department of Computer Science. Midterm Examination. CS 361 Data Structures and Algorithms Spring, 2003 University of New Mexico Department of Computer Science Midterm Examination CS 361 Data Structures and Algorithms Spring, 2003 Name: Email: Print your name and email, neatly in the space provided above;

More information

5. DIVIDE AND CONQUER I

5. DIVIDE AND CONQUER I 5. DIVIDE AND CONQUER I mergesort counting inversions closest pair of points randomized quicksort median and selection Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Review Of Topics. Review: Induction

Review Of Topics. Review: Induction Review Of Topics Asymptotic notation Solving recurrences Sorting algorithms Insertion sort Merge sort Heap sort Quick sort Counting sort Radix sort Medians/order statistics Randomized algorithm Worst-case

More information

Application of Wavelets to N body Particle In Cell Simulations

Application of Wavelets to N body Particle In Cell Simulations Application of Wavelets to N body Particle In Cell Simulations Balša Terzić (NIU) Northern Illinois University, Department of Physics April 7, 2006 Motivation Gaining insight into the dynamics of multi

More information

Bin packing and scheduling

Bin packing and scheduling Sanders/van Stee: Approximations- und Online-Algorithmen 1 Bin packing and scheduling Overview Bin packing: problem definition Simple 2-approximation (Next Fit) Better than 3/2 is not possible Asymptotic

More information

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 5 Divide and Conquer CLRS 4.3 Slides by Kevin Wayne. Copyright 25 Pearson-Addison Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve

More information

Making Fast Buffer Insertion Even Faster via Approximation Techniques

Making Fast Buffer Insertion Even Faster via Approximation Techniques Making Fast Buffer Insertion Even Faster via Approximation Techniques Zhuo Li, C. N. Sze, Jiang Hu and Weiping Shi Department of Electrical Engineering Texas A&M University Charles J. Alpert IBM Austin

More information

Analysis of Algorithms

Analysis of Algorithms Analysis of Algorithms Section 4.3 Prof. Nathan Wodarz Math 209 - Fall 2008 Contents 1 Analysis of Algorithms 2 1.1 Analysis of Algorithms....................... 2 2 Complexity Analysis 4 2.1 Notation

More information

Lecture 14: Nov. 11 & 13

Lecture 14: Nov. 11 & 13 CIS 2168 Data Structures Fall 2014 Lecturer: Anwar Mamat Lecture 14: Nov. 11 & 13 Disclaimer: These notes may be distributed outside this class only with the permission of the Instructor. 14.1 Sorting

More information

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting

More information

Algorithms. Jordi Planes. Escola Politècnica Superior Universitat de Lleida

Algorithms. Jordi Planes. Escola Politècnica Superior Universitat de Lleida Algorithms Jordi Planes Escola Politècnica Superior Universitat de Lleida 2016 Syllabus What s been done Formal specification Computational Cost Transformation recursion iteration Divide and conquer Sorting

More information

Applied C Fri

Applied C Fri Applied C++11 2013-01-25 Fri Outline Introduction Auto-Type Inference Lambda Functions Threading Compiling C++11 C++11 (formerly known as C++0x) is the most recent version of the standard of the C++ Approved

More information

Week 5: Quicksort, Lower bound, Greedy

Week 5: Quicksort, Lower bound, Greedy Week 5: Quicksort, Lower bound, Greedy Agenda: Quicksort: Average case Lower bound for sorting Greedy method 1 Week 5: Quicksort Recall Quicksort: The ideas: Pick one key Compare to others: partition into

More information

A framework for detailed multiphase cloud modeling on HPC systems

A framework for detailed multiphase cloud modeling on HPC systems Center for Information Services and High Performance Computing (ZIH) A framework for detailed multiphase cloud modeling on HPC systems ParCo 2009, 3. September 2009, ENS Lyon, France Matthias Lieber a,

More information

Divide-and-conquer: Order Statistics. Curs: Fall 2017

Divide-and-conquer: Order Statistics. Curs: Fall 2017 Divide-and-conquer: Order Statistics Curs: Fall 2017 The divide-and-conquer strategy. 1. Break the problem into smaller subproblems, 2. recursively solve each problem, 3. appropriately combine their answers.

More information

CSC 421: Algorithm Design & Analysis. Spring 2018

CSC 421: Algorithm Design & Analysis. Spring 2018 CSC 421: Algorithm Design & Analysis Spring 2018 Complexity & Computability complexity theory tractability, decidability P vs. NP, Turing machines NP-complete, reductions approximation algorithms, genetic

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a

More information

IS 709/809: Computational Methods in IS Research Fall Exam Review

IS 709/809: Computational Methods in IS Research Fall Exam Review IS 709/809: Computational Methods in IS Research Fall 2017 Exam Review Nirmalya Roy Department of Information Systems University of Maryland Baltimore County www.umbc.edu Exam When: Tuesday (11/28) 7:10pm

More information

Integer Sorting on the word-ram

Integer Sorting on the word-ram Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Design and Analysis of Algorithms Sohail Aslam January 2004 2 Contents 1 Introduction 7 1.1 Origin of word: Algorithm.................................. 7 1.2 Algorithm: Informal Definition................................

More information

Auto-Tuning Complex Array Layouts for GPUs - Supplemental Material

Auto-Tuning Complex Array Layouts for GPUs - Supplemental Material BIN COUNT EGPGV,. This is the author version of the work. It is posted here by permission of Eurographics for your personal use. Not for redistribution. The definitive version is available at http://diglib.eg.org/.

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic

More information

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows: Outline Computer Science 331 Quicksort Mike Jacobson Department of Computer Science University of Calgary Lecture #28 1 Introduction 2 Randomized 3 Quicksort Deterministic Quicksort Randomized Quicksort

More information

Weighted Activity Selection

Weighted Activity Selection Weighted Activity Selection Problem This problem is a generalization of the activity selection problem that we solvd with a greedy algorithm. Given a set of activities A = {[l, r ], [l, r ],..., [l n,

More information

CS 361, Lecture 14. Outline

CS 361, Lecture 14. Outline Bucket Sort CS 361, Lecture 14 Jared Saia University of New Mexico Bucket sort assumes that the input is drawn from a uniform distribution over the range [0, 1) Basic idea is to divide the interval [0,

More information

On Partial Sorting. Conrado Martínez. Univ. Politècnica de Catalunya, Spain

On Partial Sorting. Conrado Martínez. Univ. Politècnica de Catalunya, Spain Univ. Politècnica de Catalunya, Spain 10th Seminar on the Analysis of Algorithms MSRI, Berkeley, U.S.A. June 2004 1 Introduction 2 3 Introduction Partial sorting: Given an array A of n elements and a value

More information

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures COMP 3170 - Analysis of Algorithms & Data Structures Shahin Kamali Lecture 4 - Jan. 10, 2018 CLRS 1.1, 1.2, 2.2, 3.1, 4.3, 4.5 University of Manitoba Picture is from the cover of the textbook CLRS. 1 /

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan University of Texas at Austin wuxili@utexas.edu November 14, 2017 UT DA Wuxi Li

More information

Constraint-based Subspace Clustering

Constraint-based Subspace Clustering Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions

More information

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45 Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 45 Please do not distribute or host these slides without prior permission. Mike

More information

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem CS 577 Introduction to Algorithms: Jin-Yi Cai University of Wisconsin Madison In the last class, we described InsertionSort and showed that its worst-case running time is Θ(n 2 ). Check Figure 2.2 for

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

Quicksort. Recurrence analysis Quicksort introduction. November 17, 2017 Hassan Khosravi / Geoffrey Tien 1

Quicksort. Recurrence analysis Quicksort introduction. November 17, 2017 Hassan Khosravi / Geoffrey Tien 1 Quicksort Recurrence analysis Quicksort introduction November 17, 2017 Hassan Khosravi / Geoffrey Tien 1 Announcement Slight adjustment to lab 5 schedule (Monday sections A, B, E) Next week (Nov.20) regular

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Applying Machine Learning for Gravitational-wave Burst Data Analysis

Applying Machine Learning for Gravitational-wave Burst Data Analysis Applying Machine Learning for Gravitational-wave Burst Data Analysis Junwei Cao LIGO Scientific Collaboration Research Group Research Institute of Information Technology Tsinghua University June 29, 2016

More information

Scientific Data Mining: Why is it Difficult? Sapphire: using data mining techniques to address the data overload problem

Scientific Data Mining: Why is it Difficult? Sapphire: using data mining techniques to address the data overload problem Scientific Data Mining: Why is it Difficult? Chandrika Kamath June 25, 2008 MMDS 2008: Workshop on Algorithms for Modern Massive Data Sets LLNL-PRES-404920: This work performed under the auspices of the

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 3 Real-Time Scheduling Dr. Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Model-based Software Development Generator Lustre programs Esterel programs

More information

Divide and conquer. Philip II of Macedon

Divide and conquer. Philip II of Macedon Divide and conquer Philip II of Macedon Divide and conquer 1) Divide your problem into subproblems 2) Solve the subproblems recursively, that is, run the same algorithm on the subproblems (when the subproblems

More information

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions

More information

CS 161 Summer 2009 Homework #2 Sample Solutions

CS 161 Summer 2009 Homework #2 Sample Solutions CS 161 Summer 2009 Homework #2 Sample Solutions Regrade Policy: If you believe an error has been made in the grading of your homework, you may resubmit it for a regrade. If the error consists of more than

More information

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures COMP 3170 - Analysis of Algorithms & Data Structures Shahin Kamali Lecture 4 - Jan. 14, 2019 CLRS 1.1, 1.2, 2.2, 3.1, 4.3, 4.5 University of Manitoba Picture is from the cover of the textbook CLRS. COMP

More information

2.2 Asymptotic Order of Growth. definitions and notation (2.2) examples (2.4) properties (2.2)

2.2 Asymptotic Order of Growth. definitions and notation (2.2) examples (2.4) properties (2.2) 2.2 Asymptotic Order of Growth definitions and notation (2.2) examples (2.4) properties (2.2) Asymptotic Order of Growth Upper bounds. T(n) is O(f(n)) if there exist constants c > 0 and n 0 0 such that

More information