PetaBricks: Variable Accuracy and Online Learning

Size: px
Start display at page:

Download "PetaBricks: Variable Accuracy and Online Learning"

Transcription

1 PetaBricks: Variable Accuracy and Online Learning Jason Ansel MIT - CSAIL May 4, 2011 Jason Ansel (MIT) PetaBricks May 4, / 40

2 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

3 A motivating example How would you write a fast sorting algorithm? Jason Ansel (MIT) PetaBricks May 4, / 40

4 A motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Jason Ansel (MIT) PetaBricks May 4, / 40

5 A motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Jason Ansel (MIT) PetaBricks May 4, / 40

6 A motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Poly-algorithms Jason Ansel (MIT) PetaBricks May 4, / 40

7 std::stable sort /usr/include/c++/4.5.2/bits/stl algo.h lines Jason Ansel (MIT) PetaBricks May 4, / 40

8 std::stable sort /usr/include/c++/4.5.2/bits/stl algo.h lines Jason Ansel (MIT) PetaBricks May 4, / 40

9 std::sort /usr/include/c++/4.5.2/bits/stl algo.h lines Jason Ansel (MIT) PetaBricks May 4, / 40

10 std::sort /usr/include/c++/4.5.2/bits/stl algo.h lines Why 16? Why 15? Jason Ansel (MIT) PetaBricks May 4, / 40

11 std::sort /usr/include/c++/4.5.2/bits/stl algo.h lines Why 16? Why 15? Dates back to at least 2000 (Jun 2000 SGI release) Still in current C++ STL shipped with GCC 10+ years of of S threshold = 16 Jason Ansel (MIT) PetaBricks May 4, / 40

12 Is 15 the right number? The best cutoff (CO) changes Depends on competing costs: Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality) Jason Ansel (MIT) PetaBricks May 4, / 40

13 Is 15 the right number? The best cutoff (CO) changes Depends on competing costs: Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality) Sorting doubles with std::stable sort: CO 200 optimal on a Phenom 905e (15% speedup over CO = 15) CO 400 optimal on a Opteron 6168 (15% speedup over CO = 15) CO 500 optimal on a Xeon E5320 (34% speedup over CO = 15) CO 700 optimal on a Xeon X5460 (25% speedup over CO = 15) Compiler s hands are tied, it is stuck with 15 Jason Ansel (MIT) PetaBricks May 4, / 40

14 Back to our motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Poly-algorithms Jason Ansel (MIT) PetaBricks May 4, / 40

15 Back to our motivating example How would you write a fast sorting algorithm? Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT) PetaBricks May 4, / 40

16 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort Jason Ansel (MIT) PetaBricks May 4, / 40

17 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort On a Sun Fire T200 Niagara (8 cores) 1 16-way merge sort below way merge sort below way merge sort below way parallel merge sort Jason Ansel (MIT) PetaBricks May 4, / 40

18 Autotuned parallel sorting algorithms On a Xeon E7340 (2 4 cores) 1 Insertion sort below Quick sort below way parallel merge sort On a Sun Fire T200 Niagara (8 cores) 1 16-way merge sort below way merge sort below way merge sort below way parallel merge sort 235% slowdown running Niagara algorithm on the Xeon 8% slowdown running Xeon algorithm on the Niagara Need a way to express algorithmic choices to enable autotuning Jason Ansel (MIT) PetaBricks May 4, / 40

19 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

20 Algorithmic choices Language e i t h e r { I n s e r t i o n S o r t ( out, i n ) ; } or { QuickSort ( out, i n ) ; } or { MergeSort ( out, i n ) ; } or { R a d i x S o r t ( out, i n ) ; } Jason Ansel (MIT) PetaBricks May 4, / 40

21 Algorithmic choices Language e i t h e r { I n s e r t i o n S o r t ( out, i n ) ; } or { QuickSort ( out, i n ) ; } or { MergeSort ( out, i n ) ; } or { R a d i x S o r t ( out, i n ) ; } Representation Decision tree synthesized by our evolutionary algorithm (EA) Jason Ansel (MIT) PetaBricks May 4, / 40

22 The PetaBricks language Choices expressed in the language High level algorithmic choices Dependency-based synthesized outer control flow Parallelization strategy Programs automatically adapt to their environment Tuned using our bottom-up evaluation algorithm Offline autotuner or always-on online autotuner Jason Ansel (MIT) PetaBricks May 4, / 40

23 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

24 Variable accuracy algorithms Many problems don t have a single correct answer Jason Ansel (MIT) PetaBricks May 4, / 40

25 Variable accuracy algorithms Many problems don t have a single correct answer Soft computing Approximation algorithms for NP-hard problems DSP algorithms Different grid resolutions Data precisions Iterative algorithms Choosing convergence criteria Jason Ansel (MIT) PetaBricks May 4, / 40

26 Variable accuracy example Example... f o r ( i n t i = 0 ; i < 100; ++i ) { S O R I t e r a t i o n ( tmp ) ; }... Jason Ansel (MIT) PetaBricks May 4, / 40

27 Variable accuracy example Example... f o r ( i n t i = 0 ; i < 100; ++i ) { S O R I t e r a t i o n ( tmp ) ; }... Competing objectives of performance and accuracy Must maximize performance while meeting accuracy targets Jason Ansel (MIT) PetaBricks May 4, / 40

28 Accuracy metrics and for enough loops Language a c c u r a c y m e t r i c MyRMSError Jason Ansel (MIT) PetaBricks May 4, / 40

29 Accuracy metrics and for enough loops Language a c c u r a c y m e t r i c... f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } MyRMSError Jason Ansel (MIT) PetaBricks May 4, / 40

30 Accuracy metrics and for enough loops Language a c c u r a c y m e t r i c... f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } MyRMSError Representation Function from problem size to number of iterations synthesized by our EA Jason Ansel (MIT) PetaBricks May 4, / 40

31 Accuracy variables Language a c c u r a c y m e t r i c MyRMSError a c c u r a c y v a r i a b l e k... f o r ( i n t i =0; i <k ; ++i ) { S O R I t e r a t i o n ( tmp ) ; } Jason Ansel (MIT) PetaBricks May 4, / 40

32 Accuracy variables Language a c c u r a c y m e t r i c MyRMSError a c c u r a c y v a r i a b l e k... f o r ( i n t i =0; i <k ; ++i ) { S O R I t e r a t i o n ( tmp ) ; } Representation Function from problem size to k synthesized by our EA Jason Ansel (MIT) PetaBricks May 4, / 40

33 Variable accuracy and algorithmic choices Language a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Jason Ansel (MIT) PetaBricks May 4, / 40

34 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

35 Traditional evolution algorithm Initial population???? Cost = 0 Jason Ansel (MIT) PetaBricks May 4, / 40

36 Traditional evolution algorithm Initial population 72.7s??? Cost = 72.7 Jason Ansel (MIT) PetaBricks May 4, / 40

37 Traditional evolution algorithm Initial population 72.7s 10.5s?? Cost = 83.2 Jason Ansel (MIT) PetaBricks May 4, / 40

38 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s? Cost = 87.3 Jason Ansel (MIT) PetaBricks May 4, / 40

39 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Jason Ansel (MIT) PetaBricks May 4, / 40

40 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2???? Cost = 0 Jason Ansel (MIT) PetaBricks May 4, / 40

41 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Jason Ansel (MIT) PetaBricks May 4, / 40

42 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3???? Cost = 0 Jason Ansel (MIT) PetaBricks May 4, / 40

43 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Jason Ansel (MIT) PetaBricks May 4, / 40

44 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4???? Cost = 0 Jason Ansel (MIT) PetaBricks May 4, / 40

45 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Jason Ansel (MIT) PetaBricks May 4, / 40

46 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Cost of autotuning front-loaded in initial (unfit) population We could speed up tuning if we start with a faster initial population Jason Ansel (MIT) PetaBricks May 4, / 40

47 Traditional evolution algorithm Initial population 72.7s 10.5s 4.1s 31.2s Cost = Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Cost of autotuning front-loaded in initial (unfit) population We could speed up tuning if we start with a faster initial population Key insight Smaller input sizes can be used to form better initial population Jason Ansel (MIT) PetaBricks May 4, / 40

48 Bottom-up evolutionary algorithm Train on input size 64 Jason Ansel (MIT) PetaBricks May 4, / 40

49 Bottom-up evolutionary algorithm Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks May 4, / 40

50 Bottom-up evolutionary algorithm Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks May 4, / 40

51 Bottom-up evolutionary algorithm Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks May 4, / 40

52 Bottom-up evolutionary algorithm Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks May 4, / 40

53 Bottom-up evolutionary algorithm Train on input size 1, to form initial population for: Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Jason Ansel (MIT) PetaBricks May 4, / 40

54 Bottom-up evolutionary algorithm Train on input size 1, to form initial population for: Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Naturally exploits optimal substructure of problems Jason Ansel (MIT) PetaBricks May 4, / 40

55 Variable accuracy autotuner Jason Ansel (MIT) PetaBricks May 4, / 40

56 Variable accuracy autotuner Jason Ansel (MIT) PetaBricks May 4, / 40

57 Variable accuracy autotuner Partition accuracy space into discrete levels Jason Ansel (MIT) PetaBricks May 4, / 40

58 Variable accuracy autotuner Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks May 4, / 40

59 Variable accuracy autotuner Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks May 4, / 40

60 Variable accuracy autotuner Generation i Generation i + 1 Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks May 4, / 40

61 Variable accuracy autotuner Generation i Generation i + 1 Partition accuracy space into discrete levels Prune population to have a fixed number of representatives from each level Jason Ansel (MIT) PetaBricks May 4, / 40

62 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

63 Changing accuracy requirements Speedup (x) Accuracy Level 2.0 Accuracy Level 1.5 Accuracy Level 1.0 Accuracy Level 0.8 Accuracy Level 0.6 Accuracy Level Input Size Image Compression Jason Ansel (MIT) PetaBricks May 4, / 40

64 Changing accuracy requirements Speedup (x) Accuracy Level 0.95 Accuracy Level 0.75 Accuracy Level 0.50 Accuracy Level 0.20 Accuracy Level 0.10 Accuracy Level 0.05 Speedup (x) Accuracy Level 1.01 Accuracy Level 1.1 Accuracy Level 1.2 Accuracy Level 1.3 Accuracy Level 1.4 Speedup (x) Accuracy Level 2.0 Accuracy Level 1.5 Accuracy Level 1.0 Accuracy Level 0.8 Accuracy Level 0.6 Accuracy Level Input Size Clustering e+06 Input Size Bin Packing Input Size Image Compression Speedup (x) Accuracy Level 10 9 Accuracy Level 10 7 Accuracy Level 10 5 Accuracy Level 10 3 Accuracy Level 10 1 Speedup (x) Accuracy Level 10 9 Accuracy Level 10 7 Accuracy Level 10 5 Accuracy Level 10 3 Accuracy Level 10 1 Speedup (x) Accuracy Level 3.0 Accuracy Level 2.0 Accuracy Level 1.5 Accuracy Level 1.0 Accuracy Level 0.5 Accuracy Level e Input Size 3D Helmholtz Input Size 2D Poisson Input Size Preconditioner Jason Ansel (MIT) PetaBricks May 4, / 40

65 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Jason Ansel (MIT) PetaBricks May 4, / 40

66 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Time SOR Ite ra tio n Jason Ansel (MIT) PetaBricks May 4, / 40

67 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Grid Size Time SOR Iteration Jason Ansel (MIT) PetaBricks May 4, / 40

68 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Grid Size Time SOR Iteration Jason Ansel (MIT) PetaBricks May 4, / 40

69 Multigrid choice space Pseudo code a c c u r a c y m e t r i c MyRMSError... e i t h e r { f o r e n o u g h { S O R I t e r a t i o n ( tmp ) ; } } or { M u l t i g r i d ( tmp ) ; } or { D i r e c t S o l v e ( tmp ) ; } Grid Size Time SOR Iteration Direct Solve Jason Ansel (MIT) PetaBricks May 4, / 40

70 Autotuned V-cycle shapes Grid Size Jason Ansel (MIT) PetaBricks May 4, / 40

71 Autotuned V-cycle shapes Grid Size Jason Ansel (MIT) PetaBricks May 4, / 40

72 Autotuned V-cycle shapes Grid Size Jason Ansel (MIT) PetaBricks May 4, / 40

73 Autotuned V-cycle shapes Grid Size Grid Size Jason Ansel (MIT) PetaBricks May 4, / 40

74 Autotuned bin packing algorithms Jason Ansel (MIT) PetaBricks May 4, / 40

75 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

76 Challenges for online learning Search space is difficult to model High-dimensional Non-linear Irregular Complex dependencies Dangerous configurations exist Exponential algorithms Infinite loops Poor quality of service Jason Ansel (MIT) PetaBricks May 4, / 40

77 SiblingRivalry (online autotuner) Processor Jason Ansel (MIT) PetaBricks May 4, / 40

78 SiblingRivalry (online autotuner) Processor Safe Configuration Experimental Configuration Jason Ansel (MIT) PetaBricks May 4, / 40

79 SiblingRivalry (online autotuner) Split available resources in half Process identical requests on both halfs Race two candidate configurations (safe and experimental) and terminate slower algorithm Initial slowdown (from duplicating the request) can be overcome by autotuner Surprisingly, reduces average power consumption per request Jason Ansel (MIT) PetaBricks May 4, / 40

80 Learning technique Maintain population of candidate algorithms Each candidate must be pareto-optimal in 3D objective space: Performance Quality of service Confidence Pick safe and experimental configurations from population Mutate the experimental configuration Add the new configuration to the population if it wins the race Jason Ansel (MIT) PetaBricks May 4, / 40

81 Adaptive operator selection Extension of bandit-based differential evolution [DaCosta et al.] Deterministically choses mutation operators Requires only relative performance information Considers trade-off between exploitation and exploration 2 log k arg max AUC i + C n k i n i Jason Ansel (MIT) PetaBricks May 4, / 40

82 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

83 Experimental setup Offline Training Development machine (N cores) Deploy tuned application Baseline No Change Production machine (M cores) Run using M threads SiblingRivalry Online Training Production machine (M cores) Race M/2 threads vs M/2 threads Jason Ansel (MIT) PetaBricks May 4, / 40

84 Experimental setup Offline Training Development machine (N cores) Deploy tuned application Baseline No Change Production machine (M cores) Run using M threads SiblingRivalry Online Training Production machine (M cores) Race M/2 threads vs M/2 threads Compare throughput of both techniques Offline Training Machine 1 Deploy tuned application Production Machine 1 Migrate Production Machine 2 Migrate Production Machine 3 1 hour 10 minutes 10 minutes 10 minutes Jason Ansel (MIT) PetaBricks May 4, / 40

85 SiblingRivalry: throughput x 6.9x 9.7x offline:xeon8, online:amd48 offline:amd48, online:xeon32 Normalized throughput Sort Poisson Matrix Multiply LU Factorization Image Compression Helmholtz Clustering Bin Packing GeoMean Jason Ansel (MIT) PetaBricks May 4, / 40

86 SiblingRivalry: energy usage (on AMD48) Energy per request (joules) Baseline SiblingRivalry 0 Sort Poisson Matrix Multiply LU Factorization Image Compression Helmholtz Clustering Bin Packing 3.9 Mean Benchmark Jason Ansel (MIT) PetaBricks May 4, / 40

87 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable Accuracy 4 Offline Autotuner 5 Results: Variable Accuracy 6 Online Autotuner: SiblingRivalry 7 Results: Sibling Rivalry 8 Conclusions Jason Ansel (MIT) PetaBricks May 4, / 40

88 Conclusions Motivating goal of PetaBricks Make programs future-proof by allowing them to adapt to their environment. We can do better than hard coded constants! Jason Ansel (MIT) PetaBricks May 4, / 40

89 Thanks! Questions? Jason Ansel (MIT) PetaBricks May 4, / 40

Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms

Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms Jason Ansel Yee Lok Wong Cy Chan Marek Olszewski Alan Edelman Saman Amarasinghe MIT - CSAIL April 4, 2011 Jason Ansel (MIT) PetaBricks

More information

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems Jason Ansel Maciej Pacula Saman Amarasinghe Una-May O Reilly MIT - CSAIL July 14, 2011 Jason Ansel (MIT) PetaBricks July

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length

More information

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk CMPS 2200 Fall 2017 Sorting Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 11/17/17 CMPS 2200 Intro. to Algorithms 1 How fast can we sort? All the sorting algorithms

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Lecture 1: Asymptotics, Recurrences, Elementary Sorting Lecture 1: Asymptotics, Recurrences, Elementary Sorting Instructor: Outline 1 Introduction to Asymptotic Analysis Rate of growth of functions Comparing and bounding functions: O, Θ, Ω Specifying running

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM Sorting Chapter 11-1 - Sorting Ø We have seen the advantage of sorted data representations for a number of applications q Sparse vectors q Maps q Dictionaries Ø Here we consider the problem of how to efficiently

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC2620 Introduction to Data Structures Lecture 6a Complexity Analysis Recursive Algorithms Complexity Analysis Determine how the processing time of an algorithm grows with input size What if the algorithm

More information

Linear Selection and Linear Sorting

Linear Selection and Linear Sorting Analysis of Algorithms Linear Selection and Linear Sorting If more than one question appears correct, choose the more specific answer, unless otherwise instructed. Concept: linear selection 1. Suppose

More information

Sorting. CS 3343 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk

Sorting. CS 3343 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk CS 3343 Fall 2010 Sorting Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 10/7/10 CS 3343 Analysis of Algorithms 1 How fast can we sort? All the sorting algorithms we

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Putting the Bayes update to sleep

Putting the Bayes update to sleep Putting the Bayes update to sleep Manfred Warmuth UCSC AMS seminar 4-13-15 Joint work with Wouter M. Koolen, Dmitry Adamskiy, Olivier Bousquet Menu How adding one line of code to the multiplicative update

More information

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

More information

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort Sorting Algorithms We have already seen: Selection-sort Insertion-sort Heap-sort We will see: Bubble-sort Merge-sort Quick-sort We will show that: O(n log n) is optimal for comparison based sorting. Bubble-Sort

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2019 CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2019 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis

More information

A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true. In RAM computation model, RAM stands for Random Access Model.

A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true. In RAM computation model, RAM stands for Random Access Model. In analysis the upper bound means the function grows asymptotically no faster than its largest term. 1 true A point p is said to be dominated by point q if p.x=q.x&p.y=q.y 2 true In RAM computation model,

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback 9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Algorithms. Algorithms 2.2 MERGESORT. mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 2.2 MERGESORT. mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE Algorithms ROBERT SEDGEWICK KEVIN WAYNE 2.2 MERGESORT Algorithms F O U R T H E D I T I O N mergesort bottom-up mergesort sorting complexity divide-and-conquer ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu

More information

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018 CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2018 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis

More information

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018

CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis. Ruth Anderson Winter 2018 CSE332: Data Structures & Parallelism Lecture 2: Algorithm Analysis Ruth Anderson Winter 2018 Today Algorithm Analysis What do we care about? How to compare two algorithms Analyzing Code Asymptotic Analysis

More information

Evaluation and Validation

Evaluation and Validation Evaluation and Validation Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2016 年 01 月 05 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101 Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109 Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101 http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ March 30, 2016 Sorting (or Ordering)

More information

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February) Algorithms and Data Structures 016 Week 5 solutions (Tues 9th - Fri 1th February) 1. Draw the decision tree (under the assumption of all-distinct inputs) Quicksort for n = 3. answer: (of course you should

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14 Divide and Conquer Algorithms CSE 101: Design and Analysis of Algorithms Lecture 14 CSE 101: Design and analysis of algorithms Divide and conquer algorithms Reading: Sections 2.3 and 2.4 Homework 6 will

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Insertion Sort. We take the first element: 34 is sorted

Insertion Sort. We take the first element: 34 is sorted Insertion Sort Idea: by Ex Given the following sequence to be sorted 34 8 64 51 32 21 When the elements 1, p are sorted, then the next element, p+1 is inserted within these to the right place after some

More information

Fundamental Algorithms

Fundamental Algorithms Chapter 2: Sorting, Winter 2018/19 1 Fundamental Algorithms Chapter 2: Sorting Jan Křetínský Winter 2018/19 Chapter 2: Sorting, Winter 2018/19 2 Part I Simple Sorts Chapter 2: Sorting, Winter 2018/19 3

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 2: Sorting Harald Räcke Winter 2015/16 Chapter 2: Sorting, Winter 2015/16 1 Part I Simple Sorts Chapter 2: Sorting, Winter 2015/16 2 The Sorting Problem Definition Sorting

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 5. Divide and Conquer CLRS 4.3. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 5 Divide and Conquer CLRS 4.3 Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45 Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 45 Please do not distribute or host these slides without prior permission. Mike

More information

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 K-Lists Anindya Sen and Corrie Scalisi University of California, Santa Cruz June 11, 2007 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic

More information

IS 709/809: Computational Methods in IS Research Fall Exam Review

IS 709/809: Computational Methods in IS Research Fall Exam Review IS 709/809: Computational Methods in IS Research Fall 2017 Exam Review Nirmalya Roy Department of Information Systems University of Maryland Baltimore County www.umbc.edu Exam When: Tuesday (11/28) 7:10pm

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Sorting algorithms. Sorting algorithms

Sorting algorithms. Sorting algorithms Properties of sorting algorithms A sorting algorithm is Comparison based If it works by pairwise key comparisons. In place If only a constant number of elements of the input array are ever stored outside

More information

Central Algorithmic Techniques. Iterative Algorithms

Central Algorithmic Techniques. Iterative Algorithms Central Algorithmic Techniques Iterative Algorithms Code Representation of an Algorithm class InsertionSortAlgorithm extends SortAlgorithm { void sort(int a[]) throws Exception { for (int i = 1; i < a.length;

More information

COL 730: Parallel Programming

COL 730: Parallel Programming COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some

More information

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2 Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula

More information

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2 Outline Introduction to algorithm analysis: fibonacci seq calculation counting number of computer steps recursive formula

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Sorting and Searching. Tony Wong

Sorting and Searching. Tony Wong Tony Wong 2017-03-04 Sorting Sorting is the reordering of array elements into a specific order (usually ascending) For example, array a[0..8] consists of the final exam scores of the n = 9 students in

More information

Lecture 22: Multithreaded Algorithms CSCI Algorithms I. Andrew Rosenberg

Lecture 22: Multithreaded Algorithms CSCI Algorithms I. Andrew Rosenberg Lecture 22: Multithreaded Algorithms CSCI 700 - Algorithms I Andrew Rosenberg Last Time Open Addressing Hashing Today Multithreading Two Styles of Threading Shared Memory Every thread can access the same

More information

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018 Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54 Modular Exponentiation

More information

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting

More information

Auto-Tuning Complex Array Layouts for GPUs - Supplemental Material

Auto-Tuning Complex Array Layouts for GPUs - Supplemental Material BIN COUNT EGPGV,. This is the author version of the work. It is posted here by permission of Eurographics for your personal use. Not for redistribution. The definitive version is available at http://diglib.eg.org/.

More information

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013 Lecture 3 Real-Time Scheduling Daniel Kästner AbsInt GmbH 203 Model-based Software Development 2 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S

More information

CS 543 Page 1 John E. Boon, Jr.

CS 543 Page 1 John E. Boon, Jr. CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over

More information

Integer Programming Part II

Integer Programming Part II Be the first in your neighborhood to master this delightful little algorithm. Integer Programming Part II The Branch and Bound Algorithm learn about fathoming, bounding, branching, pruning, and much more!

More information

Problem-Solving via Search Lecture 3

Problem-Solving via Search Lecture 3 Lecture 3 What is a search problem? How do search algorithms work and how do we evaluate their performance? 1 Agenda An example task Problem formulation Infrastructure for search algorithms Complexity

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

Simultaneously Optimizing Heat Load and Trip Rates in the CEBAF Linacs

Simultaneously Optimizing Heat Load and Trip Rates in the CEBAF Linacs Simultaneously Optimizing Heat Load and Trip Rates in the CEBAF Linacs Balša Terzić, Alicia Hofler & Geoff Krafft Center for Advanced Studies of Accelerators (CASA), Jefferson Lab BTeam Meeting, November

More information

CSCE 222 Discrete Structures for Computing

CSCE 222 Discrete Structures for Computing CSCE 222 Discrete Structures for Computing Algorithms Dr. Philip C. Ritchey Introduction An algorithm is a finite sequence of precise instructions for performing a computation or for solving a problem.

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis Summary Topics commonly used functions analysis of algorithms experimental asymptotic notation asymptotic analysis big-o big-omega big-theta READING: GT textbook

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford Carol Hsin Abstract The objective of this project is to return expected electricity

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan University of Texas at Austin wuxili@utexas.edu November 14, 2017 UT DA Wuxi Li

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

W vs. QCD Jet Tagging at the Large Hadron Collider

W vs. QCD Jet Tagging at the Large Hadron Collider W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

Applying Machine Learning for Gravitational-wave Burst Data Analysis

Applying Machine Learning for Gravitational-wave Burst Data Analysis Applying Machine Learning for Gravitational-wave Burst Data Analysis Junwei Cao LIGO Scientific Collaboration Research Group Research Institute of Information Technology Tsinghua University June 29, 2016

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms

CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms CS 4407 Algorithms Lecture 2: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

1 A Support Vector Machine without Support Vectors

1 A Support Vector Machine without Support Vectors CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis 1 Summary Summary analysis of algorithms asymptotic analysis big-o big-omega big-theta asymptotic notation commonly used functions discrete math refresher READING:

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Algorithms and Their Complexity

Algorithms and Their Complexity CSCE 222 Discrete Structures for Computing David Kebo Houngninou Algorithms and Their Complexity Chapter 3 Algorithm An algorithm is a finite sequence of steps that solves a problem. Computational complexity

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 3 Real-Time Scheduling Dr. Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Model-based Software Development Generator Lustre programs Esterel programs

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3

MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 MA008 p.1/37 MA008/MIIZ01 Design and Analysis of Algorithms Lecture Notes 3 Dr. Markus Hagenbuchner markus@uow.edu.au. MA008 p.2/37 Exercise 1 (from LN 2) Asymptotic Notation When constants appear in exponents

More information

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 17: Iterative Methods and Sparse Linear Algebra Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

CSCI 3110 Assignment 6 Solutions

CSCI 3110 Assignment 6 Solutions CSCI 3110 Assignment 6 Solutions December 5, 2012 2.4 (4 pts) Suppose you are choosing between the following three algorithms: 1. Algorithm A solves problems by dividing them into five subproblems of half

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

Fast Sorting and Selection. A Lower Bound for Worst Case

Fast Sorting and Selection. A Lower Bound for Worst Case Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 0 Fast Sorting and Selection USGS NEIC. Public domain government image. A Lower Bound

More information

Fast Algorithms for Segmented Regression

Fast Algorithms for Segmented Regression Fast Algorithms for Segmented Regression Jayadev Acharya 1 Ilias Diakonikolas 2 Jerry Li 1 Ludwig Schmidt 1 1 MIT 2 USC June 21, 2016 1 / 21 Statistical vs computational tradeoffs? General Motivating Question

More information

3. Algorithms. What matters? How fast do we solve the problem? How much computer resource do we need?

3. Algorithms. What matters? How fast do we solve the problem? How much computer resource do we need? 3. Algorithms We will study algorithms to solve many different types of problems such as finding the largest of a sequence of numbers sorting a sequence of numbers finding the shortest path between two

More information

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern

More information

Multiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund

Multiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund Multiprocessor Scheduling I: Partitioned Scheduling Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 22/23, June, 2015 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 47 Outline Introduction to Multiprocessor

More information

Weighted Activity Selection

Weighted Activity Selection Weighted Activity Selection Problem This problem is a generalization of the activity selection problem that we solvd with a greedy algorithm. Given a set of activities A = {[l, r ], [l, r ],..., [l n,

More information

5. DIVIDE AND CONQUER I

5. DIVIDE AND CONQUER I 5. DIVIDE AND CONQUER I mergesort counting inversions closest pair of points randomized quicksort median and selection Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling Scheduling I Today! Introduction to scheduling! Classical algorithms Next Time! Advanced topics on scheduling Scheduling out there! You are the manager of a supermarket (ok, things don t always turn out

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Deduction and Induction. Stephan Schulz. The Inference Engine. Machine Learning. A Match Made in Heaven

Deduction and Induction. Stephan Schulz. The Inference Engine. Machine Learning. A Match Made in Heaven Deduction and Induction A Match Made in Heaven Stephan Schulz The Inference Engine Machine Learning Deduction and Induction A Match Made in Heaven Stephan Schulz or a Deal with the Devil? The Inference

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information