Welcome to MCS 572. content and organization expectations of the course. definition and classification

Size: px

Start display at page:

Download "Welcome to MCS 572. content and organization expectations of the course. definition and classification"

Madlyn Cole
5 years ago
Views:

1 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up MCS 572 Lecture 1 Introduction to Supercomputing Jan Verschelde, 22 August 2016 Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

2 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

3 Catalog Description Introduction to supercomputing on vector and parallel processors; architectural comparisons, parallel algorithms, vectorization techniques, parallelization techniques, actual implementation on real machines. Prerequisites: 1 a working knowledge of C/C++ (mcs 360), and scientific software (mcs 507); 2 familiarity with algorithms at the level of introductory numerical analysis (mcs 471). MCS 572 is one of the courses on the computational science prelim. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

4 Content of the Course Two recommended text books: Barry Wilkinson and Michael Allen: Parallel Programming. Techniques and Applications Using Networked Workstations and Parallel Computers. 2nd Edition. Prentice-Hall David B. Kirk and Wen-mei W. Hwu: Programming Massively Parallel Processors. A Hands-on Approach. Elsevier nd Edition, Parallel programming goals: 1 design and analysis of parallel programs; 2 implementation using MPI, OpenMP, and threads; 3 application to scientific problems. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

5 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

6 Organization and Expectations Three programming parts of the course: 1 using Message-Passing Interface (MPI) for clusters, 2 for shared memory: pthreads and OpenMP, 3 programming Graphics Processing Units (GPUs) using CUDA (Compute Unified Device Architecture) of NVIDIA. Activities throughout the semester: several homework collections, midterm exam could be take home, three computer projects. The first two computer projects will be on prescribed topics and may be solved in pairs. The third project must be done individually and could form the basis for a project presentation at the end. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

7 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

8 what is a supercomputer? Supercomputing = use of a supercomputer (also called high performance computing). Definition A supercomputer is a computing system (hardware, system & application software) that provides close to the best currently achievable sustained performance on demanding computational problems. Classification atwww.top500.org. A flop is a floating point operation. Performance is often measured in number of flops per second. If two flops can be done per clock cycle, then a processor at 3GHz can theoretically perform 6 billion flops (6 gigaflops) per second. All computers in the top 10 achieve more than 1 petaflop per second. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

9 top 10 of November 2011 Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

10 top 10 of November 2013 Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

11 top 10 of June 2016 Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

12 Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

13 system terms and architectures core for a CPU: unit capable of executing a thread, for a GPU: a streaming multiprocessor. R max maximal performance achieved on the LINPACK benchmark (solving a dense linear system) for problem size N max, measured in Gflop/s. R peak theoretical peak performance measured in Gflop/s. Power total power consumed by the system. Types of architectures, using commodity leading edge microprocessors running at their maximal clock and power limits; special processor chips running at less than maximal power to achieve high physical packaging densities; mix of chip types and accelerators (GPUs). Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

14 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

15 speedup and efficiency By p we denote the number of processors. Speedup S(p) = sequential execution time parallel execution time. Another measure for parallel performance: Efficiency E(p) = speedup #processors = S(p) p 100%. In the best case, we hope: S(p) = p and E(p) = 100%. If E = 50%, then on average processors are idle for half of the time. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

16 superlinear speedup While we hope for S(p) = p, we may achieve S(p) > p. Example. Sequential search in unsorted list. A parallel search by p processors divides the list evenly in p sublists. p = 3 : x y z The sequential search time depends on position in list. The parallel search time depends on position in sublist. huge speedup if at first element of last sublist. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

17 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

18 predicting speedup Consider a job that takes time t on one processor. Let R be the fraction of t that must be done sequentially, R [0, 1]. p = 1 Rt (1 R)t {}}{{}}{ p = 4 } {{ } Rt Speedup on p processors S(p) }{{} (1 R)t p t Rt + (1 R)t p = 1 R + 1 R p 1 R. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

19 Amdahl s Law Theorem (Amdahl (1967)) Let R be the fraction of the operations which cannot be done in 1 parallel. The speedup with p processors is bounded by R + 1 R. p Corollary. S(p) 1 R as p. Example. Suppose 90% of the operations in an algorithm can be executed in parallel. What is the best speedup with 8 processors? What is the best speedup with an unlimited amount of processors? 1 p = 8: ( 1 10) = p = : 1 1/10 = 10 Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

20 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

21 scaled speedup Consider a job that took time t on p processors. Let s be the fraction of t that is done sequentially. p = 4 st {}}{{ (1 s)t }}{ }{{}} {{ } st p(1 s)t Scaled speedup S s (p) st + p(1 s)t t = s + p(1 s) = p + (1 p)s. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

22 Gustafson s Law The problem size scales with the number of processors! Theorem (Gustafson s Law (1988)) If s is the fraction of serial operations in a parallel program run on p processors, then the scaled speedup is bounded by p + (1 p)s. Example. Suppose benchmarking reveals that 5% of time on a 64-processor machine is spent on one single processor (e.g.: root node working while all other processors are idle). Compute the scaled speedup. p = 64, s = 0.05: S s (p) 64 + (1 64)0.05 = = Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

23 Welcome to MCS About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson s Law quality up Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

24 quality up More processing power often leads to better results. finer granularity of a grid e.g.: discretization of space and/or time in a differential equation greater confidence of estimates e.g.: enlarged number of samples in a simulation compute with larger numbers (multiprecision arithmetic) e.g.: solve an ill-conditioned linear system quality up Q(p) = quality on p processors quality on 1 processor Q(p) measures improvement in quality using p procesors, keeping the computational time fixed. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

25 summary and recommended reading We defined supercomputing, speedup, and efficiency. Gustafson s Law reevaluates Amdahl s Law. Available to UIC via the ACM digital library: Jeannette M. Wing: computational thinking. Communications of the ACM 49(3):33-35, Peter M. Kogge and Timothy J. Dysart: Using the TOP500 to trace and project technology and architecture trends. In SC 11 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM John L. Gustafson: Reevaluating Amdahl s Law. Communications of the ACM 31(5): , Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

26 Exercises Homework will be collected at a to be announced date. 1 How many processors whose clock speed runs at 3.0GHz does one need to build a supercomputer which achieves a theoretical peak performance of at least 4 Tera Flops? Justify your answer. 2 Suppose we have a program where 2% of the operations must be executed sequentially. According to Amdahl s law, what is the maximum speedup which can be achieved using 64 processors? Assuming we have an unlimited number of processors, what is the maximal speedup possible? 3 Benchmarking of a program running on a 64-processor machine shows that 2% of the operations are done sequentially, i.e.: that 2% of the time only one single processor is working while the rest is idle. Use Gustafson s law to compute the scaled speedup. 4 Visithttp://extreme.uic.edu/hardware-specs/ and estimate the theoretical peak performance. Introduction to Supercomputing (MCS 572) Welcome to MCS 572 L-1 22 August / 26

Review for the Midterm Exam

Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations