High-Performance Scientific Computing

Similar documents
Fundamentals of Computational Science

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Introduction The Nature of High-Performance Computation

INTENSIVE COMPUTATION. Annalisa Massini

ECE321 Electronics I

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

AS102 -The Astronomical Universe. The boring details. AS102 - Major Topics. Day Labs - Rooms B4 & 606. Where are we (earth, sun) in the universe?

Astro 32 - Galactic and Extragalactic Astrophysics/Spring 2016

CS 700: Quantitative Methods & Experimental Design in Computer Science

A100 Exploring the Universe: Introduction. Martin D. Weinberg UMass Astronomy

B629 project - StreamIt MPI Backend. Nilesh Mahajan

Astronomy 1010: Survey of Astronomy. University of Toledo Department of Physics and Astronomy

ECE/CS 250 Computer Architecture

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Assignment 1 Physics/ECE 176

ECE/CS 250: Computer Architecture. Basics of Logic Design: Boolean Algebra, Logic Gates. Benjamin Lee

Course Staff. Textbook

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates

AY2 Introduction to Astronomy Winter quarter, 2013

Cactus Tools for Petascale Computing

Methods and tools used in experimental particle and nuclear physics. Lectures handouts:

CSE 241 Class 1. Jeremy Buhler. August 24,

CMSC 313 Lecture 16 Announcement: no office hours today. Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo

One Optimized I/O Configuration per HPC Application

Computational Seismology: An Introduction

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1

MA 580; Numerical Analysis I

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

Matrix Assembly in FEA

The Beauty and Joy of Computing Lecture #22: Future of Computing Instructor: Sean Morris CS + CELLS

Logic Design II (17.342) Spring Lecture Outline

Reliability at Scale

Lecture 13: Sequential Circuits, FSM

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

AST 103 The Solar System

Clawpack Tutorial Part I

From Physics to Logic

FAS and Solver Performance

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

AS 102 The Astronomical Universe (Spring 2010) Lectures: TR 11:00 am 12:30 pm, CAS Room 316 Course web page:

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Digital Logic. CS211 Computer Architecture. l Topics. l Transistors (Design & Types) l Logic Gates. l Combinational Circuits.

CS10 The Beauty and Joy of Computing Lecture #24 Future of Computing

AS The Astronomical Universe. Prof. Merav Opher - Fall 2013

Collaborative WRF-based research and education enabled by software containers

Applications of Mathematical Economics

CSCB58:Computer Organization

Real Astronomy from Virtual Observatories

Lecture 1. EE70 Fall 2007

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Knowledge Discovery and Data Mining 1 (VO) ( )

Digital Systems EEE4084F. [30 marks]

ASTR 200 : Lecture 33. Structure formation & Cosmic nuceleosynthesis

Scalable and Power-Efficient Data Mining Kernels

Textbook: Explorations: An Introduction to Astronomy, 4 th Edition by: Thomas T. Arny

SPECIAL PROJECT PROGRESS REPORT

A100H Exploring the Universe: Introduction. Martin D. Weinberg UMass Astronomy

ww.padasalai.net

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

ISP 205: Visions of the Universe. Your Professor. Assignments. Course Resources

CS 542G: Conditioning, BLAS, LU Factorization

Science : Introduction to Astronomy Course Syllabus

CS 100: Parallel Computing

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

Matrix-Vector Operations

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Syllabus. Physics 0847, How Things Work Section II Fall 2014

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

HIRES 2017 Syllabus. Instructors:

Students will explore Stellarium, an open-source planetarium and astronomical visualization software.

Chapter 1 Exercises. 2. Give reasons why 10 bricklayers would not in real life build a wall 10 times faster than one bricklayer.

Homework #1 - September 9, Due: September 16, 2005 at recitation ( 2 PM latest) (late homework will not be accepted)

Some notes on efficient computing and setting up high performance computing environments

ASTR 1010 Astronomy of the Solar System. Course Info. Course Info. Fall 2006 Mon/Wed/Fri 11:00-11:50AM 430 Aderhold Learning Center

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 40

Statics - TAM 210 & TAM 211. Spring 2018

Julian Merten. GPU Computing and Alternative Architecture

Larry Zhang. Office: DH

Data analysis of massive data sets a Planck example

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Communication-avoiding parallel and sequential QR factorizations

Big Bang, Black Holes, No Math

CMSC 313 Lecture 15 Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo

EAM. ReaxFF. PROBLEM B Fracture of a single crystal of silicon

Announcements Monday, September 18

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

E11 Lecture 1: The Big Picture & Digital Systems. Profs. David Money Harris & Sarah Harris Fall 2011

ECMWF Computing & Forecasting System

Matrix-Matrix Multiplication

URP 4273 Section 8233 Introduction to Planning Information Systems (3 Credits) Fall 2017

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Introduction to numerical computations on the GPU

Computer Algorithms CISC4080 CIS, Fordham Univ. Outline. Last class. Instructor: X. Zhang Lecture 2

Computer Algorithms CISC4080 CIS, Fordham Univ. Instructor: X. Zhang Lecture 2

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Communication-avoiding parallel and sequential QR factorizations

Transcription:

High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org Roadrunner (Los Alamos) Jaguar (Oak Ridge) 122,400 cores 224,162 cores Outline of today s lecture Goals of this course, strategy for getting there Mechanics of homeworks Computer/software requirements Brief overview of computational science and challenges Overview High Performance Computing (HPC) generally means heavy-duty computing on clusters or supercomputers with 100s to million(s) of cores. Our focus is more modest, but we will cover much background material that is: Essential to know if you eventually want to do HPC Extremely useful for any scientific computing project, even on a laptop. Focus on scientific computing as opposed to other computationally demanding domains, for which somewhat different tools might be best.

Focus and Topics Efficiently using single processor and multi-core computers Basic computer architecture, e.g. floating point arithmetic, cache hierarchies, pipelining Using Unix (or Linux, Mac OS X) Language issues, e.g. compiled vs. interpreted, object oriented, etc. Specific languages: Python, Fortran 90/95 Parallel computing with OpenMP, MPI, IPython Efficient programming as well as minimizing run time Version control: Mercurial (hg), Makefiles, Python scripting, Debuggers Strategy So much material, so little time... Concentrate on basics, simple motivating examples. Get enough hands-on experience to be comfortable experimenting further and learning much more on your own. Learn what s out there to help select what s best for your needs. Teach many things by example as we go along. Lecture notes html and pdf versions at (green = link in pdf file) http://www.amath.washington.edu/~rjl/am583 Written using Sphinx: Python-based system for writing documentation. Learn by example!! Source for each file can be seen by clicking on Show Source on right-hand menu. Source files are in class hg repository. You can clone the repository and run Sphinx yourself to make a local version. $ hg clone http://bitbucket.org/.../uwamath583s11 $ cd uwamath583s11/sphinx $ make html $ firefox _build/html/index.html

Lecture slides Slides from lectures will be linked from the Slides section of the class notes. Generally in 3 forms, including one with space for taking notes. With luck they will be posted at least 2 hours before class if you want to print and bring along. Note: Slides will contain things not in the notes, lectures will also include hands-on demos not on the slides. Prerequisites Some programming experience in some language, e.g., Matlab, C, Java. You should be comfortable: editing a file containing a program and executing it, using basic structures like loops, if-then-else, input-output, writing subroutines or functions in some language You are not expected to know Python or Fortran. Some basic knowledge of linear algebra, e.g.: what vectors and matrices are and how to multiply them How to go about solving a linear system of equations Some comfort level for learning new sofware and willingness to dive in to lots of new things. Homeworks There will be 6 homeworks, plus a take-home final exam. Electronic submission: via Mercurial (in order to get experience using Mercurial!) Homework assignments will be in the notes. Main goal: introduce many topics and get some hands-on experience with each.

Homework #1 Homework #1 is in the notes. Tasks: Make sure you have a computer that you can use with Unix (e.g. Linux of Mac OSX), Python 2.5 or higer, Mercurial See next slide. Use Mercurial (hg) to clone the class repository and set up your own repository. Copy a Python script from one to the other and run it, putting the output in a second file. Commit these files and push them to your repository for us to see. Computer/Software requirements You will need access to a computer with a number of things on it, see the section of the notes on Downloading and Installing Software. Note: Unix is often required for scientific computing. Windows: Many tools we ll use can be used with Windows, but learning Unix is part of this class. Options: Install everything you ll need on your own computer, Install VirtualBox and use the Virtual Machine (VM) created for this class. Use a Linux machine in the Applied Mathematics department (via ssh). TA and Office Hours TA: Grady Lemoine See the Class Catalyst Page for contact info, updated hours. Office hours in Guggenheim 406 Monday, Tuesday, Friday 1:30 2:30 There is also a Discussion Board on the Class Catalyst Page, feel free to post (and answer!) questions about getting things to work.

Survey Please take the survey found on the Class Catalyst Page to let us know about your background and computing plans. As soon as possible. Computational Science (and Engineering) Often called the third pillar of science, complementing the traditional pillars of theory and experiment. Direct numerical simulation of complex physics / biology / chemistry is possible. Typically requires solving very large systems of mathematical equations. Computational Science (and Engineering) Unknowns represent values of some physical quantities, e.g., (x, y, z) locations and velocities of individual atoms in a molecular dynamics simulation, (x, y, z) locations and velocities of individual stars in a cosmology simulation, e.g. galaxy formation. Density, pressure, velocities of a fluid at billions of points in a fluid dynamics simulation, Stress, strain, velocity of a solid at billions of points in a solid mechanics simulation. Note: 1000 1000 1000 grid has 1 billion grid points. Need 8 gigabytes to store one variable at all grid points.

Computational Science and Engineering A few examples of large scale problems for motivation. Currently, that often means Tera-scale or Peta-scale. Next comes Exa-scale. How fast are computers? Kilo = thousand (10 3 ) Mega = million (10 6 ) Giga = billion (10 9 ) Tera = trillion (10 12 ) Peta = 10 15 Exa = 10 18 Processor speeds usually measured in Gigahertz these days. Hertz means machine cycles per second. One operation may take a few cycles. So a 1 GHz processor can do > 100, 000, 000 operations per second. Exascale is a billion times more than Gigascale. (More speed and/or data.) Combustion Goal: Developing more fuel efficient and cleaner combustion processes for petroleum and alternative fuels. Sample computation at Oak Ridge National Laboratory: Ethylene combustion (simple!) More than 1 billion grid points, x = y = z =15 microns 4.5 million processor hours on Jaguar s 31,000 cores Generated > 120 terabytes of data http://www.scidacreview.org/0902/html/news1.html

Milky Way s dark matter halo Goal: Understand nature of the universe. Sample computation at Oak Ridge National Laboratory: 1.1 billion particles of dark matter, simulated for 13.7 billion years > 1 million processor hours on Jaguar (3000 cores) http://www.scidacreview.org/0901/html/bt.html How fast are computers? Kilo = thousand (10 3 ) Mega = million (10 6 ) Giga = billion (10 9 ) Tera = trillion (10 12 ) Peta = 10 15 Exa = 10 18 Processor speeds usually measured in Gigahertz these days. Hertz means machine cycles per second. One operation may take a few cycles. So a 1 GHz processor can do > 100, 000, 000 operations per second. Exascale is a billion times more than Gigascale. (More speed and/or data.) How long does it take to solve a linear system? Solving an n n linear system Ax = b requires 1 3 n3 flops. (Using Gauss elimination for a dense matrix.) On a 100 MFlops system: n flops time 10 3.3 10 2 0.0000033 seconds 100 3.3 10 5 0.0033 seconds 1000 3.3 10 8 3.33 seconds 10000 3.3 10 11 333 seconds = 5.5 minutes 100000 3.3 10 14 333333 seconds = 92.5 hours 1000000 3.3 10 17 92500 hours = 105 years Assuming data transfer is not a problem! It is a problem: It s often the bottleneck, not compute speed! 10 6 10 6 matrix has 10 12 elements = 8 terabytes.

Moore s Law Technology Trends: Microprocessor Capacity Moore s Law 2X transistors/chip Every 1.5 years Called Moore s Law Microprocessors have become smaller, denser, and more powerful. Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. Slide source: Jack Dongarra 01/17/2007 CS267-Lecture 1 4 Increasing speed Moore s Law: Processor speed doubles every 18 months. = factor of 1024 in 15 years. Going forward: Number of cores doubles every 18 months. Top: Total computing power of top 500 computers Middle: #1 computer Bottom: #500 computer http://www.top500.org More Limits: How fast can a serial computer be? 1 Tflop/s, 1 Tbyte sequential machine r = 0.3 mm Consider the 1 Tflop/s sequential machine: - Data must travel some distance, r, to get from memory to CPU. - To get 1 data element per cycle, this means 10 times per second at the speed of light, c = 3x10 12 8 m/s. Thus r < c/10 12 = 0.3 mm. Now put 1 Tbyte of storage in a 0.3 mm x 0.3 mm area: - Each bit occupies about 1 square Angstrom, or the size of a small atom. No choice but parallelism 01/17/2007 CS267-Lecture 1 9 Slide Source: Kathy Yellick

Take away messages Massively parallel machines are needed for Petascale or Exascale (millions or billions of cores). But also, all machines going to be multicore soon, with lots of cores. If you want to continue seeing benefiting from hardware improvements, you need to know something about parallel computing.