Data science and engineering for local weather forecasts. Nikhil R Podduturi Data {Scientist, Engineer} November, 2016

Similar documents
Implementation of global surface index at the Met Office. Submitted by Marion Mittermaier. Summary and purpose of document

Scatterometer Wind Assimilation at the Met Office

Global NWP Index documentation

Hurricane Prediction with Python

A Community Gridded Atmospheric Forecast System for Calibrated Solar Irradiance

JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2016

FORECASTING: A REVIEW OF STATUS AND CHALLENGES. Eric Grimit and Kristin Larson 3TIER, Inc. Pacific Northwest Weather Workshop March 5-6, 2010

The benefits and developments in ensemble wind forecasting

WMO Aeronautical Meteorology Scientific Conference 2017

Application and verification of ECMWF products in Croatia

NUMERICAL EXPERIMENTS USING CLOUD MOTION WINDS AT ECMWF GRAEME KELLY. ECMWF, Shinfield Park, Reading ABSTRACT

How Computers Help Humans Root Cause Issues at Netflix SETH KATZ QCON NEW YORK, 2018

ECMWF global reanalyses: Resources for the wind energy community

Weather Forecasting: Lecture 2

SAWS: Met-Ocean Data & Infrastructure in Support of Industry, Research & Public Good. South Africa-Norway Science Week, 2016

Continuous Machine Learning

A SOLAR AND WIND INTEGRATED FORECAST TOOL (SWIFT) DESIGNED FOR THE MANAGEMENT OF RENEWABLE ENERGY VARIABILITY ON HAWAIIAN GRID SYSTEMS

Swedish Meteorological and Hydrological Institute

Operational use of ensemble hydrometeorological forecasts at EDF (french producer of energy)

Computational Challenges in Big Data Assimilation with Extreme-scale Simulations

1 Introduction. Station Type No. Synoptic/GTS 17 Principal 172 Ordinary 546 Precipitation

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution

Application and verification of ECMWF products 2012

How to use the guidance tool (Producing Guidance and Verification)

GENERAL DESCRIPTION OF THE WEATHER FORECAST PROCESS WITH EMPHASIS ON FORECAST UNCERTAINTY. Zoltan Toth

CARLOS F. M. COIMBRA (PI) HUGO T. C. PEDRO (CO-PI)

YOPP archive: needs of the verification community

1. INTRODUCTION 2. QPF

Numerical Weather Prediction: Data assimilation. Steven Cavallo

The WMO Global Basic Observing Network (GBON)

Ocean Modeling. Matt McKnight Boxuan Gu

Application and verification of ECMWF products 2016

Impact of GPS and TMI Precipitable Water Data on Mesoscale Numerical Weather Prediction Model Forecasts

The ECMWF coupled data assimilation system

Discovery Through Situational Awareness

Progress on GCOS-China CMA IOS Development Plan ( ) PEI, Chong Department of Integrated Observation of CMA 09/25/2017 Hangzhou, China

DATA BROWSING AND ANALYSIS TOOL FOR MTSAT/LRIT. Ryoji Kumabe

Role of the forecaster

Introduction to Meteorology and Weather Forecasting

Xinhua Liu National Meteorological Center (NMC) of China Meteorological Administration (CMA)

Integrated Electricity Demand and Price Forecasting

EWGLAM/SRNWP National presentation from DMI

Observing System Impact Studies in ACCESS

Application and verification of ECMWF products 2008

Application and verification of ECMWF products 2014

Forecasting demand in the National Electricity Market. October 2017

Verification of ECMWF products at the Deutscher Wetterdienst (DWD)

WWRP Implementation Plan Reporting AvRDP

Application and verification of ECMWF products in Croatia - July 2007

BRINGING MARINE DATA ASSETS TO THE FUTURE INTERNET

Data Short description Parameters to be used for analysis SYNOP. Surface observations by ships, oil rigs and moored buoys

Current research issues. Philippe Bougeault, Météo France

Developments towards multi-model based forecast product generation

Five years of limited-area ensemble activities at ARPA-SIM: the COSMO-LEPS system

Clustering Techniques and their applications at ECMWF

Convective-scale NWP for Singapore

ABSTRACT 3 RADIAL VELOCITY ASSIMILATION IN BJRUC 3.1 ASSIMILATION STRATEGY OF RADIAL

Weather Analysis and Forecasting

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts

Implementation Guidance of Aeronautical Meteorological Observer Competency Standards

AVIATION APPLICATIONS OF A NEW GENERATION OF MESOSCALE NUMERICAL WEATHER PREDICTION SYSTEM OF THE HONG KONG OBSERVATORY

Finnish Open Data Portal for Meteorological Data

Application and verification of ECMWF products 2015

The Use of Analog Ensembles to Improve Short-Term Solar Irradiance Forecasting

The Impacts of GPS Radio Occultation Data on the Analysis and Prediction of Tropical Cyclones. Bill Kuo, Xingqin Fang, and Hui Liu UCAR COSMIC

Application and verification of ECMWF products 2018

The Impact of Observational data on Numerical Weather Prediction. Hirokatsu Onoda Numerical Prediction Division, JMA

Can buoys predict hurricanes? Objectives Students will be able to: track drifter buoys determine the course of the gulf stream current

MeteoGroup RoadMaster. The world s leading winter road weather solution

Nowcasting thunderstorms for aeronautical end-users

Categorical Verification

Application and verification of ECMWF products 2012

Training: Climate Change Scenarios for PEI. Training Session April Neil Comer Research Climatologist

CGE TRAINING MATERIALS ON VULNERABILITY AND ADAPTATION ASSESSMENT. Climate change scenarios

NextGen Update. Cecilia Miner May, 2017

David John Gagne II, NCAR

HEPS. #HEPEX Quebec 2016 UPGRADED METEOROLOGICAL FORCING FOR OPERATIONAL HYDROLOGICAL ENSEMBLE PREDICTIONS: CHALLENGES, RISKS AND CHANCES

Application and verification of ECMWF products 2015

South African Weather Service SWIOCOF-5 Pre-Forum. Seasonal Forecast, Water Resources and Expected Outcomes

GL Garrad Hassan Short term power forecasts for large offshore wind turbine arrays

Forecasting AOSC 200 Tim Canty. Class Web Site: Lecture 26 Nov 29, Weather Forecasting

METinfo Verification of Operational Weather Prediction Models December 2017 to February 2018 Mariken Homleid and Frank Thomas Tveter

Weather Forecasting. March 26, 2009

Application and verification of ECMWF products 2010

Importance of Numerical Weather Prediction in Variable Renewable Energy Forecast

METinfo Verification of Operational Weather Prediction Models June to August 2017 Mariken Homleid and Frank Thomas Tveter

Improving the accuracy of solar irradiance forecasts based on Numerical Weather Prediction

Snowfall Detection and Rate Retrieval from ATMS

Basic Verification Concepts

<Operational nowcasting systems in the framework of the 4-D MeteoCube>

Experiences of using ECV datasets in ECMWF reanalyses including CCI applications. David Tan and colleagues ECMWF, Reading, UK

The current status, functions, challenges and needs of South Sudan Meteorological Department (SSMD)

Better Weather Data Equals Better Results: The Proof is in EE and DR!

Wind Assessment & Forecasting

AN ENSEMBLE STRATEGY FOR ROAD WEATHER APPLICATIONS

Operations Portugal (second half of 2010)

ATMOSPHERIC MODELLING. GEOG/ENST 3331 Lecture 9 Ahrens: Chapter 13; A&B: Chapters 12 and 13

PREDICTION OF OIL SPILL TRAJECTORY WITH THE MMD-JMA OIL SPILL MODEL

Supporting Information

Be relevant and effective thinking beyond accuracy and timeliness

Transcription:

1

Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist, Engineer} November, 2016

Agenda About MeteoGroup Introduction to weather data Problem description Data science and weather forecasting Engineering Verification Results Questions 3

How many of you check weather forecasts frequently? 4

5

6 Weather data

7 1.5 TB/day

Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) MeteoGroup measurement network 8

Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) MeteoGroup measurement network Satellite data 9

Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) MeteoGroup measurement network Satellite data Radar data 10

Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) MeteoGroup measurement network Satellite data Radar data User data 11

Types of data Observations: WMO weather stations (e.g: surface, upper-air, ships, drifting buoys, aircrafts etc) MeteoGroup measurement network Satellite data Radar data User data Numerical weather prediction model data 12

Numerical weather prediction models Complex and Multidimensional data 13

Numerical weather prediction models Complex and multidimensional data 5 NWP models from different providers 14

Numerical weather prediction models Complex and multidimensional data 5 NWP models from different providers Data size per day - 0.5 TB 15

Data science and weather forecasting 16

17

Outcome Took 24 hours for 24 hour forecasts Grid interval - 736 km Poor results 18

MeteoGroup Forecasting system 19

MeteoGroup forecasting system 3 years of NWP data Machine learning model Trained model Forecasts 3 years of observation data Daily NWP data 20

MeteoGroup forecasting system Written in pascal 21

MeteoGroup forecasting system Written in pascal Runs on in house high performance computing cluster 22

MeteoGroup forecasting system Written in pascal Runs on in house high performance computing cluster Limitations Hard to maintain Not very transparent Scalability 23

24 Problem description

Next generation forecasting system Cloud based solution 25

Next generation forecasting system Cloud based solution Transparent 26

Next generation forecasting system Cloud based solution Transparent Scalable 27

Next generation forecasting system Cloud based solution Transparent Scalable Improve forecasting accuracy 28

Baseline model NWP data Downscale to location Interpolate missing values Linear model 29

Baseline model NWP data Downscale to location Interpolate missing values Linear model Outcome: Very fast Poor accuracy Multicollinearity 30

Iteration 1 Address multicollinearity using feature selection Scale the features NWP data Downscale to location Interpolate missing values Scale features Feature selection Linear model 31

Iteration 1 Address multicollinearity using feature selection Scale the features NWP data Downscale to location Interpolate missing values Scale features Feature selection Linear model Outcome: Improved accuracy 32

Iteration 2 Model selection between linear and non-linear models Advanced feature selection NWP data Downscale to location Interpolate missing values Scale features Advance feature selection Model selection (linear and non-linear models) 33

Iteration 2 Model selection between linear and non-linear models Advanced feature selection NWP data Downscale to location Interpolate missing values Scale features Advance feature selection Model selection (linear and non-linear models) Outcome: On par with existing forecasting system Slow training 34

35 Engineering to scale the product

Baseline model engineering (Scikit-learn, NumPy, Keras with TensorFlow) 36

Model engineering (Scikit-learn, NumPy, Keras with TensorFlow) Good: Python ML ecosystem Familiarity among the team Test driven and Agile Development Fail fast 37

Model engineering (Scikit-learn, NumPy, Keras with TensorFlow) Good: Python ML ecosystem Familiarity among the team Test driven and Agile Development Fail fast Bad: Not scalable 38

47000 * 15 * 360 model runs Locations Weather attributes e.g: temperature, wind etc Hours 39

Scaling with Apache Airflow 40 Apache Airflow By AirBnB Apache product since early 2016 Directed Acyclic Graph (DAG) Components UI Scheduler Executor(s)

Apache Airflow DAG Hooks (connections) Operators (tasks) Schedule Dependencies 41

Airflow and Mesos deploy persist AWS S3 Airflow scheduler Mesos cluster 42

Airflow and Mesos Cont Integ deploy Persist AWS S3 Airflow scheduler Mesos cluster 43

44 Verification

Model improvement cycle Deploy DAG Verify model Improve DAG 45

Forecast verification Forecast Engine AWS S3 with models JSON-LD 46

Verification metrics Mean absolute error Root mean squared error Mean error Heidke skill score Equitable threat score Probability density functions Error percentiles 47

48 Mean absolute error for different models (Temperature)

49 Probability distribution function for multiple models (Temperature)

Percentile graphs for each model (Temperature)

51 For demo please stop by MG booth

Results Cloud based solution AWS S3, EC2, ElastiCache Transparent Scalable Improve forecasting accuracy 52

Results Cloud based solution AWS S3, EC2, ElastiCache Transparent Verification microservice Scalable Improve forecasting accuracy 53

Results Cloud based solution AWS S3, EC2, ElastiCache Transparent Verification microservice Scalable Mesos cluster Training time a month to 5 hours (approx) Improve forecasting accuracy 54

Results Cloud based solution AWS S3, EC2, ElastiCache Transparent Verification microservice Scalable Mesos cluster Training time a month to 5 hours (approx) Improve forecasting accuracy On par or better 55

Improvements Hyperlocal AWS lambda integration Iterate for more accuracy 56

57 Questions?

We are hiring!

59