Data Intensive Computing meets High Performance Computing

Similar documents
Real Astronomy from Virtual Observatories

Astronomy of the Next Decade: From Photons to Petabytes. R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST

Michael L. Norman Chief Scientific Officer, SDSC Distinguished Professor of Physics, UCSD

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond

Quantum computing with superconducting qubits Towards useful applications

Exascale I/O challenges for Numerical Weather Prediction

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Target and Observation Managers. TOMs

Scientific Data Flood. Large Science Project. Pipeline

The Millennium Simulation: cosmic evolution in a supercomputer. Simon White Max Planck Institute for Astrophysics

D4.2. First release of on-line science-oriented tutorials

Big Computing in High Energy Physics. David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy

Supercomputing: Why, What, and Where (are we)?

Climate Modeling in a Changed World

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

The State of the Universe

Update on Cray Earth Sciences Segment Activities and Roadmap

The Square Kilometre Array and Data Intensive Radio Astronomy 1

ECMWF Computing & Forecasting System

Part 3: The Dark Energy

Reading Material: DOE HEP Cosmic Frontier Use Cases. Salman Habib

Nuclear Physics at the Interface of 21 st Century Computational Science

Science at the Kavli Institute

The Memory Intensive System

Core-Collapse Supernova Simulation A Quintessentially Exascale Problem

Virtual Observatory: Observational and Theoretical data

Learning Objectives: Chapter 13, Part 1: Lower Main Sequence Stars. AST 2010: Chapter 13. AST 2010 Descriptive Astronomy

The Virgo Cluster. Distance today: 20 Mpc (million parsecs) Distance in 100 years: 20 Mpc Mpc. (Oh well) Look-back time to Virgo Cluster:

Integrating Globus into a Science Gateway for Cryo-EM

Marla Meehl Manager of NCAR/UCAR Networking and Front Range GigaPoP (FRGP)

The structure and evolution of stars. Learning Outcomes

Data Management Plan Extended Baryon Oscillation Spectroscopic Survey

Cosmological N-Body Simulations and Galaxy Surveys

Astroinformatics in the data-driven Astronomy

Some ML and AI challenges in current and future optical and near infra imaging datasets

Stellar Explosions (ch. 21)

Scalable and Power-Efficient Data Mining Kernels

4.3 The accelerating universe and the distant future

Quarks and the Cosmos

The first 400,000 years

Multi-wavelength Astronomy

Kavli IPMU-Berkeley Symposium "Statistics, Physics and Astronomy" January , 2018 Lecture Hall, Kavli IPMU

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada

Galaxy A has a redshift of 0.3. Galaxy B has a redshift of 0.6. From this information and the existence of the Hubble Law you can conclude that

Cosmology with the Sloan Digital Sky Survey Supernova Search. David Cinabro

Set 1: Expansion of the Universe

APPLICATIONS FOR PHYSICAL SCIENCE

Cosmology with the Sloan Digital Sky Survey Supernova Search. David Cinabro

LARGE QUASAR GROUPS. Kevin Rahill Astrophysics

The Contents of the Universe (or/ what do we mean by dark matter and dark energy?)

Astronomy 162, Week 10 Cosmology Patrick S. Osmer Spring, 2006

What do we Know About the Universe?

Lecture 25: Cosmology: The end of the Universe, Dark Matter, and Dark Energy. Astronomy 111 Wednesday November 29, 2017

Big-Data as a Challenge for Astrophysics

ALMA Development Program

XLDB CONFERENCE WELCOME. James Williams September 11, 2012

BAYESIAN CROSS-IDENTIFICATION IN ASTRONOMY

Chapter 22 What do we mean by dark matter and dark energy?

Lab Monday optional: review for Quiz 3. Lab Tuesday optional: review for Quiz 3.

Mining Digital Surveys for photo-z s and other things

Machine Learning Applications in Astronomy

What s the longest single-shot exposure ever recorded of any object or area of space by Hubble?

Olbers Paradox. Why is the sky dark? Possible answers:

ASTRONOMY 202 Spring 2007: Solar System Exploration

Moment of beginning of space-time about 13.7 billion years ago. The time at which all the material and energy in the expanding Universe was coincident

UNIT 3 The Study of the. Universe. Chapter 7: The Night Sky. Chapter 8: Exploring Our Stellar Neighbourhood. Chapter 9:The Mysterious.

Olbers Paradox. Lecture 14: Cosmology. Resolutions of Olbers paradox. Cosmic redshift

Astroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University

Astronomical Notes. Astronomische Nachrichten. A machine learning classification broker for the LSST transient database

Lecture 37 Cosmology [not on exam] January 16b, 2014

TECA, 13TB, 80,000 processors

Perspective from Fermilab NRC-DUSEL, December 15, 2010

Doing astronomy with SDSS from your armchair

Breakout Report on Correlated Materials. Identification of Grand Challenges

Set 5: Expansion of the Universe


Lecture 32: Astronomy 101

Astronomy 182: Origin and Evolution of the Universe

Lecture 24: Cosmology: The First Three Minutes. Astronomy 111 Monday November 27, 2017

Cosmology II: The thermal history of the Universe

Hubble s Law and the Cosmic Distance Scale

DOE Office of High Energy Physics Perspective on DUSEL

DAME: A Web Oriented Infrastructure For Scientific Data

Inflation - a solution to the uniformity problem. include more dark matter. count up all mass in galaxies include massive dark galaxy halos

COSMOLOGY The Universe what is its age and origin?

Outline Oxana Smirnova, Particle Physics 2

The King's University College Astronomy 201 Mid-Term Exam Solutions

Pulsar Surveys Present and Future: The Arecibo-PALFA Survey and Projected SKA Survey

From the Big Bang to Big Data. Ofer Lahav (UCL)

The Formation of the Solar System

A100 Exploring the Universe: Discovering Galaxies. Martin D. Weinberg UMass Astronomy

Astronomy 114. Lecture35:TheBigBang. Martin D. Weinberg. UMass/Astronomy Department

Gravitational Wave Data (Centre?)

Quantum Computing & Scientific Applications at ORNL

Spatial Data Science. Soumya K Ghosh

Telescope Bibliographies and Astronomical Data

Dark Matter & Dark Energy. Astronomy 1101

Transcription:

Data Intensive Computing meets High Performance Computing Kathy Yelick Associate Laboratory Director for Computing Sciences, Lawrence Berkeley National Laboratory Professor of Electrical Engineering and Computer Sciences, UC Berkeley

National Energy Research Scientific Computing Facility Department of Energy Office of Science (unclassified) Facility 4000 users, 500 projects From 48 states; 65% from universities 1400 refereed publications per year Systems designed for science 1.3 PF Hopper system (Cray XE6) - 4th Fastest computer in US, 8th in world.5 PF in Franklin (Cray XT4), Carver (IBM idataplex) and other clusters Computing Sciences 2 2

Science is Increasingly Data Intensive Existing ability to generate science data is already challenging our ability to store, analyze, & archive it. Some observational devices grow in capability with Moore s Law. Data sets are growing exponentially. Petabyte (10 15 Byte) data sets are common: Climate: next IPCC estimates 10s of PBs Genome: JGI alone will have about 1 PB this year and double each year Particle physics: LHC projects 16 PB / yr Astrophysics: LSST, others, estimate 5 PB / yr Petascale HPC simulations on today s systems lead to petascale datasets 3 Computing Sciences 3

Scientific Data is Growing Exponentially Scientific is stored in tape archives Repacking is essential to keep up with data and technology growth Ability to store, transfer, and analyze are limitations: e.g., DOE runs its own network Computing Sciences 4 4

Growth in Data Outstrips Growth in Computing Goal: Grow storage, transfer & analysis capability for DOE facilities Data generation exceeds storage and process rate Needs energy efficient computing, memory & I/O Increase over 2010 60 50 40 30 20 10 0 Projected Rates Sequencers Detectors Processors Memory 2010 2011 2012 2013 2014 2015 Computing Sciences 5

Science in the Data 2006 Nobel Prize on anisotropy of Cosmic Microwave Background Shows an image of the universe at 400,000 years 2011 Nobel Prize on accelerating expansion of universe Measured by supernovae as standard candles Simulations combined with observational data 2011 Discovery of Youngest nearby Supernova: first-of-a-kind images Used machine learning to eliminate 90% of the manual image search Discover 8,700 new astrophysical transients, including supernovae, novae, active galaxies, and quasars, and three new classes of objects Cosmic Microware Background Computing Sciences Youngest nearby Supernova discovered Expansion of the universe 6

Emerging Challenges Data Provenance Lab Notebook for digital data Captures critical parameters, analysis chain, annotations, etc. Provide reproducibility and verifiability Life-Cycle Management Observational Data Gains Value Data Curation What piece of data will be important in 2060? Value Time Obs data Model data Computing Sciences 7

Develop and Provide Science Gateway Infrastructure 30+ projects use the NERSC Filesystem (NGF) -> web gateway Gauge Connection: QCD data in HPSS PyDap: Interactive subselection of 20 th Century ReAnalysis climate data Computational Research Software for Science Deep Sky: Web interface to analyze astronomical data; steer observations Distributed systems and cloud computing Workflows: Carbon Capture (CCSI), Environment (ASCEM), Neutrinos (DayaBay), Soil, Water, Data management algorithms and visualization Computing Sciences 8

Facilities Require Exascale Computing Astronomy Particle Physics Chemistry and Materials Genomics Fusion Petascale to Exascale Petabyte data sets today, many growing exponentially Processing grows super-linearly Need to move entire DOE workload to Exascale Computing Sciences 9

The Scientific Exploration Process Simulation Site Exascale Simula3on Machine + analysis Parallel Storage Perform some data analysis on exascale machine (e.g. in Situ pattern identification) Experiment/observation Site Experiment/observa3on Processing Machine (Parallel) Storage Archive Archive Analysis Sites Need to reduce EBs and PBs of data, and move only TBs to simulation sites Analysis Analysis Analysis Machine Machine Machines Shared Shared storage Shared storage storage Reduce and prepare data for further exploratory Analysis (Data mining)

Summary Computing for simulation has been considered the 3 rd pillar of science Data analysis is quickly becoming a 4 th Challenges include: Size of data Technology: storage, networking, computing Mathematics: discovering information in massive data Data management, provenance, curation Computing Sciences 11 11