An environmental modelling and information service for health analytics Oliver Schmitz1, Derek Karssenberg1, Kor de Jong1, Harm de Raaff2 1 Department of Physical Geography, Faculty of Geosciences, Utrecht University, Utrecht, The Netherlands 2 Sound Web Development B.V., The Netherlands 24 August 2016
Research on air quality and health Motivation Personal exposure estimation combines geographical and health research Time Space-time paths Risk factor concentration 2 Risk factor concentration 1 Spatial extent of study area Richardson et al, 2013, Science
Research on air quality and health Estimating personal exposure Health = f (Personal Exposure,.,.,.) The Personal Exposure encompasses the totality of human environmental exposures (i.e. non-genetic), complementing the genome, e.g.: Air pollution exposure Water pollution exposure Climate Infections Main research objectives are often: To identify f using cohort studies To use f to estimate health impacts on a population
Research on air quality and health Challenges Calculating personal exposure High spatial detail and large coverage of environmental information Detailed information regarding space-time paths of individuals Significant computing power and storage Accessibility of the information Information can only be generated by geoscientists Not accessible to doctors and health researchers Privacy issues
Research on air quality and health Required calculations id 1 2 3 4 age 12 id 11 1 1 2 45 2 3 4 3 sex wor class k m age sex 12 m 12 m m 11 f m 11 m 45 f f 45 f 4 4 id 4 4 f f Health databases (cohorts) id age sex e1 e2 1 12 m 12.4 32.5 2 11 m 11.7 1.8 3 45 f 0.9 2.8 4 4 f 0.45 1.9 health 25 20 15 10 5 0 0 1 2 id e1 e2 1 12.4 32.5 2 11.7 1.8 3 0.9 2.8 4 0.45 1.9 Cohorts enriched with personal exposure Statistical analysis 30 Location of individuals Modelled environmental attributes 3 4 5 6 7 8 Personal exposure Personal Exposure of individuals
Environmental modelling and information service Web service enriching cohorts with exposure values Medical doctors, health researchers Upload cohort; download cohort enriched with exposure values Client (web browser) Back end: Environmental data archive Exposure calculation
Environmental modelling and information service System architecture Cohort Portal Portal Cohort with personal exposure Feature Feature Environment Environment Service Service Environmental Environmental information information archive archive Environmental Environmental Models Models
Calculating personal exposure High spatial detail and large coverage of environmental information Detailed information regarding space-time paths of individuals Significant computing power and storage Accessibility of the information Information can only be generated by geoscientists Not accessible to doctors and health researchers Privacy issues
Geocomputation Forward modelling Expressing environmental processes in space and time Zt = f(zt-1) Zt is a set of stochastic variables representing the state at time t f set of functions representing process change over timestep
Geocomputation The PCRaster modelling platform Research and software development on spatio-temporal multi-paradigm modelling PCRaster Python module > 200 operations for spatio-temporal modelling on field based data Framework for uncertainty analysis Seamless interaction with NumPy and GDAL Developed at Utrecht University since 1990s, open source since 2013
Geocomputation PCRaster software stack LUE data model: phenomenon encapsulating agents and fields
Geocomputation Air pollution modelling Using land use regression models from the ESCAPE project (e.g., Beelen et al 2013), e.g. for PM2.5, PM10, NO2 or NOx Air pollution map (average over a year) Regression Regression intercept parameters Predictor maps
Geocomputation Air pollution modelling Calculating PM10 values: PM10 = 23.71 + 2.15E-8 trafficload + 6.67E-6 popdens + 0.02 roadlength PM10 (microgram/m3) Number of vehicles within 500 m Number of people living within 5 km Total road length within 50 m (m)
Geocomputation Air pollution modelling Calculating PM10 values: PM10 = 23.71 + 2.15E-8 trafficload + 6.67E-6 popdens + 0.02 roadlength The PM10 air pollution model in Python:
Geocomputation Spatially distributed air pollution concentration maps National coverage at 5m resolution NO2 50 m
Geocomputation Mapping over time 1 hour time step, 5m resolution, petabytes of data 50 concentration NO2 air pollution, Utrecht city (hourly averages, summer, microgram/m3) 0 1 km Calculated from land use regression model with temporally variable regression coefficients concentration 50 0 0 time (hours) 24
Geocomputation Field-network interactions Aggregated concentrations as edge weights: Python modules used: gdal, networkx, ogr, pcraster
Geocomputation Activity-based modelling Using space-time path information to calculate personal exposure. PM10 concentration (mug/m3) 38 24 0 Distance travelled (km) 10
Calculating personal exposure High spatial detail and large coverage of environmental information Detailed information regarding space-time paths of individuals Significant computing power and storage Accessibility of the information Information can only be generated by geoscientists Not accessible to doctors and health researchers Privacy issues
Provisioning of model results and data Challenges: Fast response time Different locations of computation and storage Different types of requests (data, aggregation in space and/or time) Different work loads (cohort size) Different end user devices (web portal, Trusted Third Parties, mobile devices,.) No monolithic system but microservices architecture: Individual services with dedicated tasks Straightforward roll out to distributed system Used: Flask + extensions, docker, gunicorn, requests,.
Provisioning of model results and data Microservices scheme
Provisioning of model results and data Access through web portal Download cohort enriched with personal exposures
Summary High resolution spatio-temporal datasets as basis for exposure analysis on entire population Field-network operations or spatial aggregation on different scales as proxies for space-time paths of individuals Aggregated data accessible via services Challenges: Computational: due to detailed space-time discretisation Implementation: it s not a model, it s a modelling platform Scientific: analysis and validation of spatio-temporal health data
PCRaster Open source spatio-temporal modelling software Available for Linux, Windows, (OS X) Field-based & agent-based modelling Parallel execution and I/O on supercomputers (work in progress) http://www.pcraster.eu https://github.com/pcraster/ Healthy Urban Living http://www.uu.nl/en/research/sustainability-healthy-urban-living Healthy urban living project team Utrecht University & partners Oliver Schmitz, Utrecht University, Netherlands, o.schmitz@uu.nl
References Beelen et al, 2013. Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe - The ESCAPE project. Atmospheric Environment 72, 10-23. Eeftens et al, 2012. Development of land use regression models for PM2.5, PM 2.5 absorbance, PM10 and PMcoarse in 20 European study areas; Results of the ESCAPE project. Environmental Science and Technology (46), 11195-11205. Karssenberg et al, 2010, A software framework for construction of process-based stochastic spatio-temporal models and data assimilation. Environmental Modelling & Software, 25, 489502. Acknowledgements: Dr. Rob Beelen, Prof. Dr. Bert Brunekreef, Prof. Dr. Rick Grobbee, Harm de Raaff (MSc), Peter Hessels (MSc), Prof. Dr. Gerard Hoek, Folkert Jan de Groot (MSc), Kor de Jong (MSc), Prof. Dr. Martin Dijst, Dr. Derek Karssenberg, Dr. Oliver Schmitz, Dr. Maciek Strak, Ivan Soenario (MSc), Dr. Ilonca Vaartjes