Statistical methods for evaluating air pollution and temperature effects on human health Ho Kim School of Public Health, Seoul National University ISES-ISEE 2010 Workshop August 28, 2010, COEX, Seoul, Korea
# Hour Description File names 1 8:30-9:30 Introduction to Air pollution Epidemiology 1Intro.pdf 2 9:30-10:00 Introduction to R (1) 2introR.pdf,2introR.r prerequirement.r 10:00-10:30 Coffee break (R install and/or lab) Introduction to R (2) Exploring data, Introduction to R data, reading Korea data, Descriptive statistics 3expl.pdf 3 10:30-11:30 Box plot, Summary 3expl.r Time-series plot for air pollution, temperature, and mortality 4 11:30-12:30 Time-series analysis (1) 4times1.pdf 12:30-2:00 Lunch 5 2:00-3:00 Statistical analysis 5times1.pdf GAM/GLM 5times1.r 3:00-3:30 Coffee break 6 3:30-4:00 Time-series analysis (2) 6times2.pdf 7 4:00-5:00 Multisite studies, Threshold model 7times2.pdf,7times2.5 8 5:00-6:00 Case-crossover analysis - Matching variables - Stratum 8cc1.pdf, 8cc2.pdf 8cc2.r
Introduction and Basic Theories Outline Introduction to Environmental Epidemiology (Air Pollution and Temperature) Time-series analysis Lag structure Multi-pollutant models Sensitivity check (df, lag, ) Meta analysis Case-crossover analysis Non-linear problems (threshold model)
Air pollution epidemiology (Environmental epidemiology) Assessing Air Pollution Effects on Human Health Air pollution Health Outcome Confounders (weather, time trend, Seasonality, etc)
Environmental epidemiology Assessing Air Pollution & Meteorological Effects on Human Health Air pollution, weather Health Outcome Confounders (time trend, Seasonality, etc)
Environmental epidemiology Assessing Air Pollution & Meteorological Effects on Human Health Weather Mean, Max, & Air Min pollution Temp Humidity, PM10, Ozone, SO2, etc. Confounders Daily Mortality Number in Korea (time trend, seasonality) Number 550 600 650 700 750 800 850 900 # of daily events: Health Outcome Death, hospitalization 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Time
Environmental Data Time and Space Time series analysis Reduce spatial complexity: calculate daily mean PM10 value over 27 monitoring stations Date temp PM10 Humidity SO2 Ozone 2010/1/1 0.10 25 60 2010/1/2 1.20 30 67... Number 550 600 650 700 750 800 850 900 Daily Mortality Number in Korea 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Time
Spatial analysis Modeling spatial dependencies between geographical sites Park et al. Locating major PM10 source areas in Seoul using multivariate receptor modeling, Environmental and Ecological Statistics 2004;11:9-19
Health Data Several problems Accuracy of cause of death (disease) Morbidity data is not easy to collect and analyze # daily Daily hospitalization Haspital Number in Korea in Seoul National Health Insurance Data are excellent sources of health information, BUT Number 200 400 600 800 1000 1200 1400 1600 1800 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 1999 2000 2001 2002 2003 Time
Health Data Several problems Accuracy of cause of death (disease) Morbidity data is not easy to collect and analyze # daily Daily hospitalization Haspital Number in Korea in Seoul Monday Sundays and holidays Number 200 400 600 800 1000 1200 1400 1600 1800 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 1999 2000 2001 2002 2003 Time Adjustment of weekdays is essential!
Association between air-pollution and health outcomes Time-series plots Daily Mortality Number in Korea Number 550 600 650 700 750 800 850 900 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Time Correlations: Pearson corr and Spearman s rank correlations 0: No association
Q: Air Pollution,health outcome is it possible to have positive association? A H
Q: Air Pollution,health outcome is it possible to have positive association? YES A H
Q: Air Pollution,health outcome is it possible to have positive association? YES A We want to remove 1) long-term time trend 2) Seasonal trend H After controlling for those confounding factors, We want to compare dayto-day variations between air pollution and health outcomes
Q: Air Pollution,health outcome is it possible to have positive association? Long-term trend removed A We want to remove 1) long-term time trend 2) Seasonal trend H After controlling for those confounding factors, We want to compare dayto-day variations between air pollution and health outcomes
Nonparametric Smoothing: lowess Smoothing Consider X Y plot. Draw a regression line which requires no parametric assumptions The regression line is not linear The regression line is totally dependent on the data Two components of smoothing Kernal function : How to calculate weighted mean Bandwidth : width of the window (span), determines the smoothness of the regression line; wider > smoother
Nonparametric Smoothing: lowess Uniform Kernel
Nonparametric Smoothing: lowess Uniform Kernel
Nonparametric Smoothing: lowess Uniform Kernel
Nonparametric Smoothing: lowess Uniform Kernel
Nonparametric Smoothing: lowess Triangular Kernel
Nonparametric Smoothing: lowess Normal Kernel
Nonparametric Smoothing: lowess Linear assumptions at the both ends Default Lowess line : Span=0.5 Use 50% of data to calculate 1 point
Nonparametric Smoothing: lowess Lowess line : Span=0.2
Nonparametric Smoothing: lowess Lowess line : Span=0.1 Caution: Linear assumptions at the ends
Natural cubic spline df=4 Cubic polynomials with same values and derivatives at the bounds
## Types of degree of freedom in ns (PM 10 in Seoul, 2000-2007)
Choosing df is not an easy task Typically 4(seasons) time # years We are interested in robustness of the estimator of air pollution -> sensitivity analysis
temperature-health study
IPCC 2007b
Global Average Surface Temperature IPCC 2007b
Surface Temperature Anomalies IPCC 2007b
Health Impacts of Climate Change McMichael et al. 2003a
Direction and Magnitude of Climate Change Health Impacts IPCC 2007a
UNEP 2009 Total CO 2 Emissions
Health Burden of Climate Change Impacts Deaths from malaria and dengue fever, diarrhoea, malnutriti on, flooding, and (in OECD countries) heatwaves
Vulnerable Groups Subgroup analysis Interactions in a model
2003 European heat wave killed more than 35,000 people These conditions are forecasted to be common by 2040 47 Image courtesy of NASA Earth Observatory
Daily mortality in Paris during summer 2003 Source: Institut de Veille Sanitaire, France
Daily mortality in London, summer 2003 Daily mortality 0 50 100 150 200 250 15 20 25 30 Mean temperature 01jun2003 01jul2003 01aug2003 01sep2003 Date Source: Hajat Observed deaths Mean temperature Baseline deaths
Why was Paris so badly affected? Temperature extremes - high minimum temperature Poor meteorological forecast Institutional failures - hospital and care home staff on holiday No health surveillance No previous experience/knowledge - no public health measures Source: Hajat
Time-series analysis Daily mortality in London: 1993-2006 Daily mortality 100 150 200 250 300 350-10 0 10 Mean temperature 20 30 Source: Hajat
Daily mortality in London: 1993-2006 Daily mortality 100 150 200 250 300 350 Source: Hajat -10 0 10 20 30 Mean temperature Heat threshold Heat slope
FC Curriero et al. AJE 2002
Ho Kim, Jong-Sik Ha, Jeongim Park, High Temperature, heat index, and mortality in six major cities in South Korea Archives of Environmental and Occupational Health, 2006;61(6): 265-270. 54
Curriero et al., (2002) AJE 55
Vulnerable population groups The elderly Infants and Children People with Chronic Diseases People taking certain medications People whose socioeconomic status may make them more vulnerable People in certain occupation Source: Heat-related Action Plans, WHO EUROPE http://www.euro.who.int/informationsources/publications/catalogue/20080522_1