Using the EartH2Observe data portal to analyse drought indicators Lesson 4: Using Python Notebook to access and process data Preface In this fourth lesson you will again work with the Water Cycle Integrator (WCI) portal that has been developed as a part of the EartH2Obseve project. This provides access to a wide range of the data developed within the project, including the different variables of the global water resources reanalysis, and remotely sensed datasets such as satellite soil moisture and precipitation. In this lesson, we will not access the portal through the Web interface, but rather access the data files directly. This provides us access to the actual data, allowing data manipulation, plotting etc. It also allows us to download the data only for our area of interest, rather than for the whole globe. To achieve this we will primarily use the Open Data Access Protocol (OpenDAP), also referred to as a THREDDS server, to connect to the underlying NetCDF formatted files. There are also other ways to access the data on the server, including e.g. a Web Mapping Service. The following exercises will be covered: Installing and starting the Python Jupyter Notebook scripts Accessing the EartH2Observe data portal through the catalogue and finding variables of interest Retrieving and mapping precipitation and actual evaporation data for an area of interest Exploring the propagation of drought across Southern Africa using selected drought indicators Exploring the propagation of drought as a point location using selected drought indicators Table of Contents Preface... 1 Installing Python scripts and notebook... 2 Starting Python Jupyter Notebook... 3 Accessing the EartH2Observe data catalogue and identifying variables of interest... 4 Exercise 1 Mapping precipitation and actual evaporation... 7 Exercise 2 Exploring the propagation of drought using selected drought indicators... 10 Exercise 3 Exploring the propagation of drought using at a specified location... 12 1
Installing Python scripts and notebook In this first step we will install a set of Python Notebooks and supporting scripts that we will use to access and process the data from the EarH2Observe data portal. We will not cover the actual installation of Python Anaconda itself, as we assume that this has already been done prior to starting this exercise. If you have NOT installed Python Anaconda and the required packages, please carefully follow the instructions in the document Prerequisites Earth2observe Session. Before continuing please ensure that all required steps have been taken, and you have run the tests described in the document to ensure the installation was successful and fully functional. To install the required scripts on your laptop for this exercise please follow the following steps: Create a directory that you would like to work in. We recommend that the name of the directory and the path does not contain any spaces. In this tutorial the name of the directory we have created is: d:\earth2observe_session Copy the scripts in the zip file E2O_Drought_Scripts.zip and required for the exercise to the directory just created. After copying the scripts, the content of the directory should be: Three Exercises developed as Python Notebook: E2O_Session_Exercise01.ipynb E2O_Session_Exercise02.ipynb E2O_Session_Exercise03.ipynb The same three exercises developed as Python Script: E2O_Session_Exercise01.py E2O_Session_Exercise02.py E2O_Session_Exercise03.py Two supporting script containing the necessary routines drought_indicators.py e2o_waternet_utils.py 2
Starting Python Jupyter Notebook Python Jupyter Notebook, which is used to run the Python Scripts is started from the command prompt. It is recommended to open the Anaconda Command in the main Anaconda Program menu (see left figure below) This will open the command prompt. When open, navigate to the directory where you have installed the scripts. Enter Jupyter Notebook to start (see right figure above). This should open a web browser containing the notebook, with the required files displayed. Note that it is also possible to start the Notebook through the Jupyter Notebook menu item. This will then require you to navigate to the correct directory using the folder options. 3
Accessing the EartH2Observe data catalogue and identifying variables of interest In this exercise we will use the same data portal that we used to view the EartH2Observe data in a web browser, but will now access the data directly. Access to the data is provided through the OpenDAP protocol using a service called THREDDS. This is a common protocol that can be used to access data hosted on remote servers, and is used not only by the EartH2Observe project, but also by several other large modelling and data systems. A good example is the daily WRF meteorological forecast from NOAA NCEP in the US. All data is stored using the NetCDF file format, using the Climate Forcing convention (CF), which is the WMO standard. This means that the scripts in this exercise can be used to connect not only to the EartH2Observe portal, but also to any other data portal that follows the WMO standards. An important step in retrieving data from the portal is how to find the appropriate name of the file to access, as well as the name of the variable (e.g. Precip, or Runoff). Open a new web browser and enter the name and address of the data catalogue https://wci.earth2observe.eu/thredds This will open a web page providing options for which main Dataset to choose. Click on the EartH2Observe folder to open the datasets belonging to the project. Under the EartH2Observe dataset, there are three folders available; Earth Observation Datasets In situ datasets Model Datasets In the exercises described below we will work only with the Model Datasets. 4
Within the model datasets there are again several options. These reflect the two versions of the global Water Resources Re Analysis developed within the project, the first with a resolution of 0.5 degrees, and the second with a resolution of 0.25 degrees. There is also a collection of various model datasets. Select the Model Datasets Water Resources Re Analysis V1 (using forcing V0) option. Scroll down to the data from European Centre for Medium Range Weather Forecasting (ECWMF). Then identify the ECMWF Water Resources Re Analysis V1 Monthly (aggregated) option. This provides access to all the output variables of the HTESSEL CaMa model developed by ECMWF. 5
When clicking on the ECMWF Water Resources Re Analysis V1 Monthly (aggregated) a web page will open that provides links to various access protocols. In this exercise we will use only the OpenDAP option. Click option 1. OPENDAP /thredds/dodc/ecmwf/wrf1 monthly agg.nc. This will open a new web page that provides all the information on the content of the file and the URL to access it. This URL can be used in a Python script to access the data, as well as the available variables. Scroll down to identify the available variables in this file. 6
Exercise 1 Mapping precipitation and actual evaporation In this first exercise we will run a Python Notebook script that retrieves data from the EartH2Observe data portal and creates maps to plot the data across the Southern African Region. In this script we will connect to data files of HTESSEL CaMa model that was developed by the European Centre for Medium Range Weather Forecasting (ECMWF). For a given month in the data period 1980 to 2014 we will extract maps of precipitation (which is an input variable to the model) and actual evaporation (which is one of the model output variables). The maps will also be saved to a PNG file for later use. To open the Python Jupyter Notebook for Exercise 1, click on the E2O_Session_Exercise01.ipynb. This will open the file to run in a separate tab in your web browser. The script is built up out of a number of cells that contain Python code. Starting from the top, each cell should be run in turn. This is similar to running a Python script line by line. There are three different types of cell in the notebook. Comment cell (or Markdown). This may contain explanatory text, with no executable code. A cell containing executable code. This is preceded by In []: A cell containing the results of executable code (such as logs, or output figures). This is preceded by Out []: To run the code in a cell, select the cell and click on the forward button in the menu (run cell, select below). This will execute the code in that cell, and the cursor will move on to the next cell. Alternatively you can select the Shift + Enter buttons to execute a cell. There are several other options that can be used to execute code, which you can explore later if you like. 7
Select the first cell and run the code (note that this will produce no output). When code is running an asterisk to the left of the script indicates it is running. Sequentially run cells in the script until you reach the point at which the name of the data file that will be read is printed as output. Question: What is the name of the data file that will be read? Continue running the script and reviewing the outputs. You will see that currently the date for which the map of precipitation is to be displayed is February 1992. The rainfall during the 1991/1992 wet season was notably low, leading to severe drought conditions across Southern Africa. This date can be changed if desired to display data from another month/year. You will see the following message once the data for the required date has been retrieved. Question: What is the size of the grid that will be retrieved? Given this resolution and the bounding box, what is the spatial resolution of the data 0.5 degrees or 0.25 degrees? Run the next cell to plot a map of the precipitation across Southern Africa for the selected date. 8
Question: Where is the main rainfall zone across Southern Africa for February 1992? Is this where you would expect it to be for the time of year? In the next step we will map the actual evaporation for the same month. Question: What can you say about the pattern of the actual evaporation? Is this what you would expect? How does this compare to the pattern of the rainfall? Would you expect the potential evaporation to follow a similar pattern? 9
Exercise 2 Exploring the propagation of drought using selected drought indicators In this second exercise we will run a Python Notebook script to map drought indicators across the Southern African Region through which we can explore the propagation of drought. In the script we will explore indicators related to the different types of drought, including meteorological, agricultural, and hydrological drought. This script can be opened by clicking on the script E2O_Session_Exercise02.ipynb. This will open the file to run in a separate tab in your web browser. We will again use data files from the HTESSEL CaMa model developed by the European Centre for Medium Range Weather Forecasting (ECMWF). In this case we will plot a sequence of maps for 6 months, which allows us to explore how drought propagates across the region in space and time. The exercise has been set up to explore the 1991/1992 drought period, but other drought periods can be explored by changing the dates as required. As in the previous exercise, the maps will also be saved to a PNG file for later use. You can run the script until the maps showing the precipitation across the region are shown. Please take note of the years/months for which data is retrieved and displayed. Note also the file names for the PNG files to which data is written and how these are linked to the dates displayed as well as the data source. This is for ease of use when later using the data. The output maps as shown below should be shown. If desired the PNG file that has been written with the maps can be opened to see more detail. Question: These maps show the propagation of precipitation across the region from November 1991 to April 1992. What can you say about the spatial extent of precipitation? Could you say if the precipitation for this 1991/1992 wet season is lower than normal? 10
Continue running the script. In the next steps maps for selected drought indicators can be displayed. The first set of maps to be displayed are those for the Standardised Precipitation Index (SPI), showing the precipitation anomaly month by month. Again the associated PNG file can be opened to view the maps in more detail. Question: Can you identify the month at which the drought appears to be the most severe? You can plot maps for other relevant drought indicators by going back to Step 11. Amend the name of the file to access as well as the variable name (view the comment cell above Step 11 for appropriate file names and variables. When changing the variable name to plot, please remember to also amend the name of the PNG files to which the data is exported. The example below shows the setup for plotting the ETDI indicator. Please use this approach to make plots for SPI3, SPI6, ETDI, and SRI1 Question: When viewing the maps for the different indicators, what can you say about the propagation of drought through the different drought types; Meteorological, Agricultural; Hydrological? You can also explore other drought periods, such as 1983/1984; 2003/2004, or the anomalously wet period of 1999/2000. 11
Exercise 3 Exploring the propagation of drought at a specified location In this third exercise we will run a Python Notebook script to plot the evolution of drought indicators at selected geographic locations in the Southern African Region, again to explore the propagation of drought. We will explore indicators related to the different types of drought, including meteorological, agricultural, and hydrological drought. Rather than downloading the drought indicators from the platform, in this script we will actually calculate the indicators using the model inputs (Precipitation), as well as outputs (e.g. Actual Evaporation, Runoff, Soil Moisture, Groundwater levels). This script can be opened by clicking on the script E2O_Session_Exercise03.ipynb. This will open the file to run in a separate tab in your web browser. In this exercise we will explore outputs from the PCR GLOBWB model, a global hydrological model developed by the University of Utrecht. This model also contains a representation of groundwater (though in the version we are using this is not a full groundwater model. Execute the script by selecting the first cell that contains code, executing this and then proceeding to the next cell as in the previous exercises. Proceed until the cell where the data point to be displayed is defined. This data point is defined as a longitude latitude coordinate pair. Note that you can amend these coordinates to reflect the coordinates of your location of interest. Outputs for the model will be displayed at the closest cell location in the model. Proceed with the execution of the script. In the next step, data for all available months will be downloaded from the EartH2Observe data portal. 12
Within Python this data can be easily exported to a CSV formatted file if desired. Export the data and view in the CSV file. Note that this represents the rainfall rate (mm/day). To find the rainfall totals per month this value will need to be multiplied by the number of days in the month. Following this, the routine to calculate the value of the SPI n indicator can be executed. In this script the indicator values for SPI 1, SPI 3, and SPI 6 are calculated. With the indicators calculated these can be plotted. The first plot we will make is for the full period, from 1980 through 2014. This shows SPI 1 and SPI 3 We can also zoom in to be able to see the two indices in more detail. To do so, set the begin and end dates and execute the cell. 13
Question: Can you identify drought years in this zoomed in period? How do the indicators for SPI 1 and SPI 3 compare in terms of timing? Execute the subsequent cells to retrieve data for the parameters Evap (Actual Evaporation) and PotEvap (Potential Evaporation). These are the required input variables for calculating the Evapotranspiration Deficit Index (ETDI), an agricultural drought indicator. Once the calculation has completed, the evolution of the ETDI indicator can be plotted in time. In the figure below it is also compared against the evolution of the SPI 3 Question: How well do SPI 3 and ETDI correlate? Is this expected? What could be the reason of small deviations between the two (both on the high side and on the low side)? In the final step, the evolution of SRI, SGI and SPI 6 is explored. Question: How well do SPI 6 and SRI correlate? What about the values for the groundwater storage how variable is this in time? 14