Measuring Uncertainty in Spatial Data via Bayesian Melding Matt Falk Queensland University of Technology (QUT) m.falk@qut.edu.au Joint work with Robert Denham (NRW) and Kerrie Mengersen (QUT) Data Driven and Physically-based Models for Characterization of Processes in Hydrology, Hydraulics, Oceanography and Climate Change, Tuesday 22 Jan 2008
Research project called Measuring and Presenting Uncertainty in Complex Natural Resource Monitoring Programs funded by an Australian Research Council grant Focus on Whole of Catchment Water Quality Modelling I am looking at a particular element of the modelling (RUSLE) and trying to characterize the uncertainty within this element The aim is to then apply to the whole model
Outline of the Presentation Aims of my research Definition of Uncertainty Measuring Uncertainty - Bayesian Melding Bayesian Melding applied to the Universal Soil Loss Equation
Aims of my research Devise methods to measure uncertainty in complex natural resource modelling with an emphasis on water quality Incorporate spatial image data to uncertainty models Ensure methodology is statistically sound Provide uncertainty estimates to assist decision and policy makers Presenting the measured uncertainty
Definition of Uncertainty Uncertainty is the inability to determine the true state of affairs of a system - Risk Modeling, Assessment, and Management (Haimes, 2004, p. 237) Components of Uncertainty: Variability - inherent heterogeneity of the process. Temporal Spatial Individual - all other sources Incomplete Knowledge Model Uncertainty - arising from the choice of the particular model used. Parameter Uncertainty - lack of knowledge about empirical quantities in the model. Decision Uncertainty - modelling choices that reflect decisionmaker judgement.
Definition of Uncertainty In the context of natural resource models, we choose not to allocate uncertainty to different components because: It s difficult to say whether variability or incomplete knowledge is causing the uncertainty, especially when the true value which may not be available It doesn t matter since we re interested in predictive uncertainty rather than the components of uncertainty Once we find total uncertainty we can then identify which inputs are the main contributors
Measuring Uncertainty - Bayesian Melding Background Stems from the Bayesian Synthesis approach (Raftery et al., JASA, 1995), shown by Wolpert to be unsatisfactory Revised by Poole and Raftery (JASA, 2000) to give Bayesian Melding Motivated by work for the International Whaling Commission Takes account of all uncertainty information regarding a models inputs and outputs and places analysis on a sound statistical base So we have four sources of information 1. Knowledge about inputs Prior distribution of inputs q 1 (θ) 2. Data about inputs Likelihood of inputs L 1 (θ) 3. Knowledge about outputs Prior distribution of outputs q 2 (φ) 4. Data about outputs Likelihood of outputs L 2 (φ)
Bayesian Melding - Theory Bayesian Melding is then combining the sources of information together M is a model that maps inputs θ to an output φ, i.e. φ = M(θ) M and q 1 (θ) together induce a prior on the output φ, q1 (φ) Estimate q1 (φ) by simulation and nonparametric kernel density estimation Now there are two priors on the output; q 2 (φ) and q1 (φ) which are pooled giving q [φ] (φ) q 1(φ) α q 2 (φ) 1 α Find a pooled prior on the inputs q [θ] (θ) by inverting q [φ] (φ) (complicated when M is non-invertible) Sample from the Bayesian Melding posterior distribution π [θ] (θ) q [θ] (θ)l 1 (θ)l 2 (M(θ)) using the Sampling Importance Resampling algorithm (SIR) Inference about φ occurs by observing the distribution of φ = M(θ), using a Monte Carlo sample
Simulating the Posterior Distribution Posterior distribution of θ, π [θ] (θ), simulated using a modified SIR algorithm For each pixel: 1. From the prior q 1 (θ), draw k sample values {θ 1,..., θ k }. 2. For each sampled θ i, obtain φ i = M(θ i ). 3. Estimate q1 (φ), the resulting induced distribution of φ, using nonparametric density estimation. 4. Compute importance sampling weights w i = ( ) q2 (M(θ i )) 1 α q1 (M(θ L 1 (θ i )L 2 (M(θ i )) (1) i)) 5. Draw a sample of l values from the discrete distribution with values θ i and probabilities proportional to w i.
The Revised Universal Soil Loss Equation RUSLE (Renard et.al, US Dept. of Ag., 1997) calculates hillslope erosion where: A = R K L S C P (2) A = mean annual soil loss (t/ha.yr) R = rainfall erosivity factor K = soil erodibility factor L = hillslope length factor S = hillslope steepness factor C = ground cover factor P = supporting practice factor, assumed to be 1 due to lack of information Bayesian Melding is appropriate for uncertainty in USLE because we have expert knowledge of the uncertainty regarding inputs and output.
RUSLE - Case Study Area near Emerald (Central Queensland) approx 14 sq km R Factor K Factor L Factor 2646000 2644000 1710 1705 1700 1695 2646000 2644000 0.07 0.06 0.05 0.04 2646000 2644000 3.5 3.0 2.5 2.0 118000 119500 S Factor 118000 119500 C Factor 118000 119500 Soil Loss (A) 2646000 2644000 2.5 2.0 1.5 1.0 0.5 2646000 2644000 0.4 0.3 0.2 0.1 2646000 2644000 120 100 80 60 40 20 118000 119500 118000 119500 118000 119500
Bayesian Melding applied to RUSLE - Differences Inputs and Outputs are spatial GIS images - makes things a little different. No data for either the inputs or output, so no likelihoods w i = ( ) 1 α q2 (M(θ i )) q1 (M(θ i)) All available uncertainty information is conveyed through the prior distributions on inputs and output
Bayesian Melding - Application to the USLE Prior Specification for Rainfall (R) Factor R Factor is the average annual sum of individual storm erosion index values EI 30, where E is the total storm kinetic energy per unit area and I 30 is the maximum 30 minute rainfall. Estimated using an equation by u and Rosewell (Aust. J. of Soil Res., 1996). Estimated and actual R Factors compared Prior R i Gamma(r i 2 /se r, r i /se r ) where r i is the mean value from the given surface for pixel i and se r is the standard error from the fitted linear model. Model 0 2000 4000 6000 8000 0 2000 4000 6000 8000 Pluvio
Bayesian Melding - Application to the USLE Prior Specification for Soil Erodibility (K) Factor K Factor is the soil loss rate for a specific soil on a clean tilled fallow plot which is 22.13 metres in length and on a 9% slope Not feasible to gather enough data for each soil type; large amount of uncertainty Study area contains one soil type, so prior is generated by fitting a shifted beta distribution to all K factors K 0.13 Beta(7.8428, 11.4833) Frequency 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.00 0.02 0.04 0.06 0.08 0.10 0.12 K Factor
Bayesian Melding - Application to the USLE Prior Specification for Slope Length (L) and Slope Steepness (S) Factors L Factor is the ratio of soil loss from a particular field slope length, to that from a slope of length 22.13 metres, with all other conditions identical. S Factor is the ratio of soil loss from a particular field slope gradient, to that from a slope with a gradient of 9%, with all other conditions identical. L and S Factors are calculated from a Digital Elevation Model (DEM) using the raster calculator in ArcGIS Coarse DEM compared to high resolution DEM Linear model fitted and standard error observed
Bayesian Melding - Application to the USLE Prior Specification for Slope Length (L) and Slope Steepness (S) Factors S Factor 1.0 2.0 obs 3.0 4.0 3.0 L Factor 0.0 obs 2.0 Original DEM is resampled many times assuming pixels are from a N(xi, se) and a new surface fitted L and S Factors are calculated for new DEMs and compared with original L and S Factors Factors are binned and a function is fitted to the 95% confidence interval; Beta distributions are fitted with the same mean and 95% confidence interval 1.0 1.0 1.5 2.0 2.5 mean 3.0 3.5 4.0 0.0 0.5 1.0 mean 1.5 2.0
Bayesian Melding - Application to the USLE Prior Specification for Cover (C) Factor C Factor is the ratio of soil loss from an area subject to a specified cover to an otherwise identical area subject to tilled continuous fallow The Bare Ground Index is generated from satellite imagery and used in calculation of C Factor Beta distributions are fitted with the same mean and 95% confidence interval (red line) ObservedBare 0 20 40 60 80 100 0 20 40 60 80 100 Mean
Bayesian Melding - Application to the USLE Prior Specification for output (mean annual soil loss, A) Lu et. al. (Aust. J. Soil Res., 2003) report on std error comparing modelled with measured soil loss at 3.84 t/ha.yr A N(A i, 3.84), truncated at 0 because soil loss cannot be negative For example, A 33 = 4.0167 Density 0.00 0.02 0.04 0.06 0.08 0.10 0 5 10 15 20 A
Bayesian Melding - Application to the USLE Example results for one pixel Histograms of the posterior samples Solid lines represent the premodel distributions R Gamma(3838.56, 2.2646) Density 0.000 0.010 1650 1700 1750 Density 0 5 15 25 0.02 0.04 0.06 0.08 0.10 K 0.13 Beta(7.8428, 11.4833) R Factor K Factor L 1 + 3.5 Beta(0.4952, 3.1781) Density 0.0 0.5 1.0 1.5 Density 0.0 0.5 1.0 1.5 S 3.4 Beta(9.0560, 30.4785) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.4 0.6 0.8 1.0 1.2 L Factor S Factor C Beta(1.2527, 1.3529) Density 0.0 0.5 1.0 1.5 2.0 Density 0.00 0.04 0.08 0.12 A N(6.8843, 3.84) 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 C Factor A
Bayesian Melding - Application to the USLE Uncertainty Map - uncertainty measured as the standard deviation of the Bayesian Melding posterior distribution of the output Uncertainty Map 2646000 2645000 2644000 2643000 120 100 80 60 40 20 118000 119000 120000
Bayesian Melding - Application to the USLE Input Factor Uncertainty Maps R Uncertainty Map K Uncertainty Map L Uncertainty Map 2646000 2644000 40 30 20 10 2646000 2644000 0.025 0.020 0.015 0.010 0.005 2646000 2644000 1.2 1.0 0.8 0.6 0.4 0.2 118000 119500 118000 119500 118000 119500 S Uncertainty Map C Uncertainty Map 1.0 0.35 2646000 2644000 0.8 0.6 0.4 0.2 2646000 2644000 0.30 0.25 0.20 0.15 0.10 0.05 118000 119500 118000 119500
Bayesian Melding - Application to the USLE Comparison to analysis completed without a prior on the output Uncertainty Map Uncertainty Map (No Prior on Output) 2646000 2645000 2644000 2643000 120 100 80 60 40 20 2646000 2645000 2644000 2643000 120 100 80 60 40 20 118000 119000 120000 118000 119000 120000
Acknowledgments Uncertainty Map 2646000 2645000 2644000 2643000 120 100 80 60 40 20 118000 119000 120000 Thanks to Robert Denham, Kerrie Mengersen and all at NRW Remote Sensing Centre