Introduction to Spatial Data and Models

Similar documents
Introduction to Spatial Data and Models

Bayesian Linear Models

Bayesian Linear Models

Introduction to Spatial Data and Models

Principles of Bayesian Inference

Principles of Bayesian Inference

Principles of Bayesian Inference

Hierarchical Modelling for Univariate Spatial Data

The linear model is the most fundamental of all serious statistical models encompassing:

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data

Bayesian Linear Models

Introduction to Spatial Data and Models

Principles of Bayesian Inference

Bayesian Linear Regression

Introduction to Geostatistics

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Hierarchical Modelling for non-gaussian Spatial Data

Hierarchical Modeling for non-gaussian Spatial Data

MCMC algorithms for fitting Bayesian models

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Hierarchical Modeling for Multivariate Spatial Data

Basics of Point-Referenced Data Models

Hierarchical Modelling for non-gaussian Spatial Data

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data

Point-Referenced Data Models

Nearest Neighbor Gaussian Processes for Large Spatial Data

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

CBMS Lecture 1. Alan E. Gelfand Duke University

Hierarchical Modeling for Spatio-temporal Data

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Gaussian predictive process models for large spatial data sets.

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

STA 4273H: Statistical Machine Learning

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Model Assessment and Comparisons

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian data analysis in practice: Three simple examples

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian linear regression

Hierarchical Modeling for Spatial Data

Multivariate spatial modeling

Foundations of Statistical Inference

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Chapter 4 - Fundamentals of spatial processes Lecture notes

Markov Chain Monte Carlo methods

Introduction to Bayes and non-bayes spatial statistics

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

A Bayesian Treatment of Linear Gaussian Regression

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Methods for Machine Learning

Practicum : Spatial Regression

Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

Computational statistics

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

CSC 2541: Bayesian Methods for Machine Learning

David Giles Bayesian Econometrics

Markov Chain Monte Carlo

CS281A/Stat241A Lecture 22

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Bayes: All uncertainty is described using probability.

The Metropolis-Hastings Algorithm. June 8, 2012

Markov Chain Monte Carlo methods

Linear Models A linear model is defined by the expression

Chapter 4 - Fundamentals of spatial processes Lecture notes

Gaussian Processes 1. Schedule

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Stat 5101 Lecture Notes

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Default Priors and Effcient Posterior Computation in Bayesian

Hierarchical Linear Models

Part 6: Multivariate Normal and Linear Models

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian non-parametric model to longitudinally predict churn

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

variability of the model, represented by σ 2 and not accounted for by Xβ

CPSC 540: Machine Learning

Metropolis-Hastings Algorithm

Gibbs Sampling in Endogenous Variables Models

ST 740: Linear Models and Multivariate Normal Inference

One-parameter models

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Bayesian Econometrics

Bayesian spatial hierarchical modeling for temperature extremes

Bayesian Networks in Educational Assessment

10. Exchangeability and hierarchical models Objective. Recommended reading

Modelling Multivariate Spatial Data

Transcription:

Introduction to Spatial Data and Models Researchers in diverse areas such as climatology, ecology, environmental health, and real estate marketing are increasingly faced with the task of analyzing data that are: Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. March 3, 200 Researchers in diverse areas such as climatology, ecology, environmental health, and real estate marketing are increasingly faced with the task of analyzing data that are: highly multivariate, with many important predictors and response variables, Researchers in diverse areas such as climatology, ecology, environmental health, and real estate marketing are increasingly faced with the task of analyzing data that are: highly multivariate, with many important predictors and response variables, geographically referenced, and often presented as maps, and Researchers in diverse areas such as climatology, ecology, environmental health, and real estate marketing are increasingly faced with the task of analyzing data that are: highly multivariate, with many important predictors and response variables, geographically referenced, and often presented as maps, and temporally correlated, as in longitudinal or other time series structures. Researchers in diverse areas such as climatology, ecology, environmental health, and real estate marketing are increasingly faced with the task of analyzing data that are: highly multivariate, with many important predictors and response variables, geographically referenced, and often presented as maps, and temporally correlated, as in longitudinal or other time series structures. motivates hierarchical modeling and data analysis for complex spatial (and spatiotemporal) data sets.

Type of spatial data Type of spatial data point-referenced data, where Y (s) is a random vector at a location s R r, where s varies continuously over D, a fixed subset of R r that contains an r-dimensional rectangle of positive volume; point-referenced data, where Y (s) is a random vector at a location s R r, where s varies continuously over D, a fixed subset of R r that contains an r-dimensional rectangle of positive volume; areal data, where D is again a fixed subset (of regular or irregular shape), but now partitioned into a finite number of areal units with well-defined boundaries; 3 Frontiers of Statistical Decision Making and Bayesian Analysis 3 Frontiers of Statistical Decision Making and Bayesian Analysis Type of spatial data Exploration of spatial data point-referenced data, where Y (s) is a random vector at a location s R r, where s varies continuously over D, a fixed subset of R r that contains an r-dimensional rectangle of positive volume; First step in analyzing data areal data, where D is again a fixed subset (of regular or irregular shape), but now partitioned into a finite number of areal units with well-defined boundaries; point pattern data, where now D is itself random; its index set gives the locations of random events that are the spatial point pattern. Y (s) itself can simply equal for all s D (indicating occurrence of the event), or possibly give some additional covariate information (producing a marked point pattern process). 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Exploration of spatial data Exploration of spatial data First step in analyzing data First step in analyzing data First Law of Geography: Mean + Error First Law of Geography: Mean + Error Mean: first-order behavior 4 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis 2

Exploration of spatial data Exploration of spatial data First step in analyzing data First step in analyzing data First Law of Geography: Mean + Error Mean: first-order behavior Error: second-order behavior (covariance function) First Law of Geography: Mean + Error Mean: first-order behavior Error: second-order behavior (covariance function) EDA tools examine both first and second order behavior 4 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Exploration of spatial data Exploration of spatial data First Law of Geography First step in analyzing data First Law of Geography: Mean + Error Mean: first-order behavior Error: second-order behavior (covariance function) EDA tools examine both first and second order behavior Preliminary displays: Simple locations to surface displays = + data mean error 4 Frontiers of Statistical Decision Making and Bayesian Analysis 5 Frontiers of Statistical Decision Making and Bayesian Analysis Exploration of spatial data Deterministic surface interpolation Scallops Sites 6 Frontiers of Statistical Decision Making and Bayesian Analysis Spatial surface observed at finite set of locations S = {s, s 2,..., s n } Tessellate the spatial domain (usually with data locations as vertices) Fit an interpolating polynomial: f(s) = i w i (S ; s)f(s i ) Interpolate by reading off f(s 0 ). Issues: Sensitivity to tessellations Choices of multivariate interpolators Numerical error analysis 7 Frontiers of Statistical Decision Making and Bayesian Analysis 3

4 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Scallops data: image and contour plots 73.5 73.0 72.5 72.0 39.0 39.5 40.0 40.5 Longitude Latitude 8 Frontiers of Statistical Decision Making and Bayesian Analysis Scallops data: image and contour plots Drop-line scatter plot 9 Frontiers of Statistical Decision Making and Bayesian Analysis Scallops data: image and contour plots Surface plot -2.36-2.34-2.32-2.3-2.28 Longitude 37.96 37.98 38 38.02 38.04 Latitude.2.4.6.8 2 2.22.4 logsp 0 Frontiers of Statistical Decision Making and Bayesian Analysis Scallops data: image and contour plots Image contour plot 2.36 2.34 2.32 2.30 2.28 2.26 37.96 37.98 38.00 38.02 38.04 Longitude Latitude Frontiers of Statistical Decision Making and Bayesian Analysis Scallops data: image and contour plots Locations form patterns 2800 2850 28200 28250 28300 03350 03400 03450 03500 03550 Eastings Northings 2800 2850 28200 28250 28300 03350 03400 03450 03500 03550 Scallops data: image and contour plots Surface features 2800 2850 28200 28250 28300 Eastings 03350 03400 03450 03500 03550 Northings 0 2 4 6 8 0 2 4 Shrub Density 3 Frontiers of Statistical Decision Making and Bayesian Analysis

Scallops data: image and contour plots Elements of point-level modelling Interesting plot arrangements N S UTM coordinates 4879050 487900 487950 4879200 Point-level modelling refers to modelling of spatial datacollected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). 456300 456320 456340 456360 456380 456400 456420 456440 E W UTM coordinates 4 Frontiers of Statistical Decision Making and Bayesian Analysis 5 Frontiers of Statistical Decision Making and Bayesian Analysis Elements of point-level modelling Elements of point-level modelling Point-level modelling refers to modelling of spatial datacollected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). Fundamental concept: Data from a spatial process {Y (s) : s D}, where D is a fixed subset in Euclidean space. Point-level modelling refers to modelling of spatial datacollected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). Fundamental concept: Data from a spatial process {Y (s) : s D}, where D is a fixed subset in Euclidean space. Example: Y (s) is a pollutant level at site s 5 Frontiers of Statistical Decision Making and Bayesian Analysis 5 Frontiers of Statistical Decision Making and Bayesian Analysis Elements of point-level modelling Elements of point-level modelling Point-level modelling refers to modelling of spatial datacollected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). Fundamental concept: Data from a spatial process {Y (s) : s D}, where D is a fixed subset in Euclidean space. Example: Y (s) is a pollutant level at site s Conceptually: Pollutant level exists at all possible sites Point-level modelling refers to modelling of spatial datacollected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). Fundamental concept: Data from a spatial process {Y (s) : s D}, where D is a fixed subset in Euclidean space. Example: Y (s) is a pollutant level at site s Conceptually: Pollutant level exists at all possible sites Practically: Data will be a partial realization of a spatial process observed at {s,..., s n } 5 Frontiers of Statistical Decision Making and Bayesian Analysis 5 Frontiers of Statistical Decision Making and Bayesian Analysis 5

Elements of point-level modelling Stationary Gaussian processes Point-level modelling refers to modelling of spatial datacollected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). Fundamental concept: Data from a spatial process {Y (s) : s D}, where D is a fixed subset in Euclidean space. Example: Y (s) is a pollutant level at site s Conceptually: Pollutant level exists at all possible sites Practically: Data will be a partial realization of a spatial process observed at {s,..., s n } Statistical objectives: Inference about the process Y (s); predict at new locations. Suppose our spatial process has a mean, µ (s) = E (Y (s)), and that the variance of Y (s) exists for all s D. 5 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis Stationary Gaussian processes Stationary Gaussian processes Suppose our spatial process has a mean, µ (s) = E (Y (s)), and that the variance of Y (s) exists for all s D. Strong stationarity: If for any given set of sites, and any displacement h, the distribution of (Y (s ),..., Y (s n )) is the same as, (Y (s + h),..., Y (s n + h)). Weak stationarity: Constant mean µ(s) = µ, and Cov(Y (s), Y (s + h)) = C(h): the covariance depends only upon the displacement (or separation) vector. Suppose our spatial process has a mean, µ (s) = E (Y (s)), and that the variance of Y (s) exists for all s D. Strong stationarity: If for any given set of sites, and any displacement h, the distribution of (Y (s ),..., Y (s n )) is the same as, (Y (s + h),..., Y (s n + h)). Weak stationarity: Constant mean µ(s) = µ, and Cov(Y (s), Y (s + h)) = C(h): the covariance depends only upon the displacement (or separation) vector. Strong stationarity implies weak stationarity 6 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis Stationary Gaussian processes Stationary Gaussian processes Suppose our spatial process has a mean, µ (s) = E (Y (s)), and that the variance of Y (s) exists for all s D. Strong stationarity: If for any given set of sites, and any displacement h, the distribution of (Y (s ),..., Y (s n )) is the same as, (Y (s + h),..., Y (s n + h)). Weak stationarity: Constant mean µ(s) = µ, and Cov(Y (s), Y (s + h)) = C(h): the covariance depends only upon the displacement (or separation) vector. Strong stationarity implies weak stationarity The process isgaussian if Y = (Y (s ),..., Y (s n )) has a multivariate normal distribution. Suppose our spatial process has a mean, µ (s) = E (Y (s)), and that the variance of Y (s) exists for all s D. Strong stationarity: If for any given set of sites, and any displacement h, the distribution of (Y (s ),..., Y (s n )) is the same as, (Y (s + h),..., Y (s n + h)). Weak stationarity: Constant mean µ(s) = µ, and Cov(Y (s), Y (s + h)) = C(h): the covariance depends only upon the displacement (or separation) vector. Strong stationarity implies weak stationarity The process isgaussian if Y = (Y (s ),..., Y (s n )) has a multivariate normal distribution. For Gaussian processes, strong and weak stationarity are equivalent. 6 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis 6

Stationary Gaussian processes Stationary Gaussian processes Variograms Suppose we assume E[Y (s + h) Y (s)] = 0 and define E[Y (s + h) Y (s)] 2 = V ar (Y (s + h) Y (s)) = 2γ (h). This is sensible if the left hand side depends only upon h. Then we say the process is intrinsically stationary. Variograms Suppose we assume E[Y (s + h) Y (s)] = 0 and define E[Y (s + h) Y (s)] 2 = V ar (Y (s + h) Y (s)) = 2γ (h). This is sensible if the left hand side depends only upon h. Then we say the process is intrinsically stationary. γ(h) is called the semivariogram and 2γ(h) is called the variogram. 7 Frontiers of Statistical Decision Making and Bayesian Analysis 7 Frontiers of Statistical Decision Making and Bayesian Analysis Stationary Gaussian processes Stationary Gaussian processes Variograms Suppose we assume E[Y (s + h) Y (s)] = 0 and define E[Y (s + h) Y (s)] 2 = V ar (Y (s + h) Y (s)) = 2γ (h). This is sensible if the left hand side depends only upon h. Then we say the process is intrinsically stationary. γ(h) is called the semivariogram and 2γ(h) is called the variogram. Note that intrinsic stationarity defines only the first and second moments of the differences Y (s + h) Y (s). It says nothing about the joint distribution of a collection of variables Y (s ),..., Y (s n ), and thus provides no likelihood. Intrinsic Stationarity and Ergodicity Relationship between γ(h) and C(h): 2γ(h) = V ar(y (s + h)) + V ar(y (s)) 2Cov(Y (s + h), Y (s)) = C(0) + C(0) 2C(h) = 2[C(0) C(h)]. 7 Frontiers of Statistical Decision Making and Bayesian Analysis 8 Frontiers of Statistical Decision Making and Bayesian Analysis Stationary Gaussian processes Stationary Gaussian processes Intrinsic Stationarity and Ergodicity Relationship between γ(h) and C(h): 2γ(h) = V ar(y (s + h)) + V ar(y (s)) 2Cov(Y (s + h), Y (s)) = C(0) + C(0) 2C(h) = 2[C(0) C(h)]. Intrinsic Stationarity and Ergodicity Relationship between γ(h) and C(h): 2γ(h) = V ar(y (s + h)) + V ar(y (s)) 2Cov(Y (s + h), Y (s)) = C(0) + C(0) 2C(h) = 2[C(0) C(h)]. Easy to recover γ from C. The converse needs the additional assumption of ergodicity: lim u C(u) = 0. Easy to recover γ from C. The converse needs the additional assumption of ergodicity: lim u C(u) = 0. So lim u γ(u) = C(0), and we can recover C from γ as long as this limit exists. C(h) = lim u γ(u) γ(h). 8 Frontiers of Statistical Decision Making and Bayesian Analysis 8 Frontiers of Statistical Decision Making and Bayesian Analysis 7

When γ(h) or C(h) depends upon the separation vector only through the distance h, we say that the process is isotropic. In that case, we write γ( h ) or C( h ). Otherwise we say that the process is anisotropic. When γ(h) or C(h) depends upon the separation vector only through the distance h, we say that the process is isotropic. In that case, we write γ( h ) or C( h ). Otherwise we say that the process is anisotropic. If the process is intrinsically stationary and isotropic, it is also called homogeneous. 9 Frontiers of Statistical Decision Making and Bayesian Analysis 9 Frontiers of Statistical Decision Making and Bayesian Analysis When γ(h) or C(h) depends upon the separation vector only through the distance h, we say that the process is isotropic. In that case, we write γ( h ) or C( h ). Otherwise we say that the process is anisotropic. If the process is intrinsically stationary and isotropic, it is also called homogeneous. Isotropic processes are popular because of their simplicity, interpretability, and because a number of relatively simple parametric forms are available as candidates for C (and γ). Denoting h by t for notational simplicity, the next two tables provide a few examples... Some common isotropic variograms model Variogram, { γ(t) τ Linear γ(t) = 2 + σ 2 t if t > 0 0 otherwise τ 2 + σ 2 if t /φ Spherical γ(t) = τ 2 + σ 2 [ 3 2 φt 2 (φt)3] if 0 < t /φ { 0 otherwise τ Exponential γ(t) = 2 + σ 2 ( exp( φt)) if t > 0 { 0 otherwise Powered τ γ(t) = 2 + σ 2 ( exp( φt p )) if t > 0 exponential { 0 otherwise Matérn τ γ(t) = 2 + σ 2 [ ( + φt) e φt] if t > 0 at ν = 3/2 0 o/w 9 Frontiers of Statistical Decision Making and Bayesian Analysis 20 Frontiers of Statistical Decision Making and Bayesian Analysis γ(t) = Examples: Spherical Variogram τ 2 + σ 2 if t /φ τ 2 + σ 2 [ 3 2 φt 2 (φt)3] if 0 < t /φ 0 if t = 0. γ(t) = Examples: Spherical Variogram τ 2 + σ 2 if t /φ τ 2 + σ 2 [ 3 2 φt 2 (φt)3] if 0 < t /φ 0 if t = 0. While γ(0) = 0 by definition, γ(0 + ) lim t 0 + γ(t) = τ 2 ; this quantity is the nugget. 2 Frontiers of Statistical Decision Making and Bayesian Analysis 2 Frontiers of Statistical Decision Making and Bayesian Analysis 8

γ(t) = Examples: Spherical Variogram τ 2 + σ 2 if t /φ τ 2 + σ 2 [ 3 2 φt 2 (φt)3] if 0 < t /φ 0 if t = 0. While γ(0) = 0 by definition, γ(0 + ) lim t 0 + γ(t) = τ 2 ; this quantity is the nugget. lim t γ(t) = τ 2 + σ 2 ; this asymptotic value of the semivariogram is called the sill. (The sill minus the nugget, σ 2 in this case, is called the partial sill.) γ(t) = Examples: Spherical Variogram τ 2 + σ 2 if t /φ τ 2 + σ 2 [ 3 2 φt 2 (φt)3] if 0 < t /φ 0 if t = 0. While γ(0) = 0 by definition, γ(0 + ) lim t 0 + γ(t) = τ 2 ; this quantity is the nugget. lim t γ(t) = τ 2 + σ 2 ; this asymptotic value of the semivariogram is called the sill. (The sill minus the nugget, σ 2 in this case, is called the partial sill.) Finally, the value t = /φ at which γ(t) first reaches its ultimate level (the sill) is called the range, R /φ. 2 Frontiers of Statistical Decision Making and Bayesian Analysis 2 Frontiers of Statistical Decision Making and Bayesian Analysis Examples: Spherical Variogram Some common isotropic covariograms 0.0 0.2 0.4 0.6 0.8.0.2 Model Covariance function, C(t) Linear C(t) does not exist 0 if t /φ Spherical C(t) = σ 2 [ 3 2 φt + 2 (φt)3] if 0 < t /φ { τ 2 + σ 2 otherwise σ Exponential C(t) = 2 exp( φt) if t > 0 { τ 2 + σ 2 otherwise Powered σ C(t) = 2 exp( φt p ) if t > 0 exponential { τ 2 + σ 2 otherwise Matérn σ C(t) = 2 ( + φt) exp( φt) if t > 0 at ν = 3/2 τ 2 + σ 2 otherwise 0.0 0.5.0.5 2.0 b) spherical; a0 = 0.2, a =, R = 2 23 Frontiers of Statistical Decision Making and Bayesian Analysis Notes on exponential model Notes on exponential model { τ C(t) = 2 + σ 2 if t = 0 σ 2 exp( φt) if t > 0. { τ C(t) = 2 + σ 2 if t = 0 σ 2 exp( φt) if t > 0. We define the effective range, t 0, as the distance at which this correlation has dropped to only 0.05. Setting exp( φt 0 ) equal to this value we obtain t 0 3/φ, since log(0.05) 3. 24 Frontiers of Statistical Decision Making and Bayesian Analysis 24 Frontiers of Statistical Decision Making and Bayesian Analysis 9

Notes on exponential model The Matèrn Correlation Function { τ C(t) = 2 + σ 2 if t = 0 σ 2 exp( φt) if t > 0. Much of statistical modelling is carried out through correlation functions rather than variograms We define the effective range, t 0, as the distance at which this correlation has dropped to only 0.05. Setting exp( φt 0 ) equal to this value we obtain t 0 3/φ, since log(0.05) 3. Finally, the form of C(t) shows why the nugget τ 2 is often viewed as a nonspatial effect variance, and the partial sill (σ 2 ) is viewed as a spatial effect variance. 24 Frontiers of Statistical Decision Making and Bayesian Analysis 25 Frontiers of Statistical Decision Making and Bayesian Analysis The Matèrn Correlation Function Much of statistical modelling is carried out through correlation functions rather than variograms The Matèrn is a very versatile family: { σ 2 C(t) = 2 ν Γ(ν) (2 νtφ) ν K ν (2 (ν)tφ) if t > 0 τ 2 + σ 2 if t = 0 K ν is the modified Bessel function of order ν (computationally tractable) The Matèrn Correlation Function Much of statistical modelling is carried out through correlation functions rather than variograms The Matèrn is a very versatile family: { σ 2 C(t) = 2 ν Γ(ν) (2 νtφ) ν K ν (2 (ν)tφ) if t > 0 τ 2 + σ 2 if t = 0 K ν is the modified Bessel function of order ν (computationally tractable) ν is a smoothness parameter (a fractal) controlling process smoothness 25 Frontiers of Statistical Decision Making and Bayesian Analysis 25 Frontiers of Statistical Decision Making and Bayesian Analysis Variogram model fitting Variogram model fitting How do we select a variogram? Can the data really distinguish between variograms? How do we select a variogram? Can the data really distinguish between variograms? Empirical Variogram: γ(t) = 2 N(t) s i,s j N(t) (Y (s i ) Y (s j )) 2 where N(t) is the number of points such that s i s j = t and N(t) is the number of points in N(t). 26 Frontiers of Statistical Decision Making and Bayesian Analysis 26 Frontiers of Statistical Decision Making and Bayesian Analysis 0

Variogram model fitting Variogram model fitting How do we select a variogram? Can the data really distinguish between variograms? Empirical Variogram: γ(t) = 2 N(t) s i,s j N(t) (Y (s i ) Y (s j )) 2 where N(t) is the number of points such that s i s j = t and N(t) is the number of points in N(t). Grid up the t space into intervals I = (0, t ), I 2 = (t, t 2 ), and so forth, up to I K = (t K, t K ). Representing t values in each interval by its midpoint, we define: N(t k ) = {(s i, s j ) : s i s j I k }, k =,..., K. Gamma(d) 0 2 3 4 5 6 Empirical variogram: scallops data 0.0 0.5.0.5 2.0 distance 26 Frontiers of Statistical Decision Making and Bayesian Analysis 27 Frontiers of Statistical Decision Making and Bayesian Analysis Variogram model fitting Empirical variogram: scallops data Parametric Semivariograms gamma(d) 0 2 4 6 8 Exponential Gaussian Cauchy Spherical Bessel-J0 0.0 0.5.0.5 2.0 2.5 3.0 distance Bessel Mixtures - Random Weights gamma(d) 0 2 4 6 8 Two Three Four Five 0.0 0.5.0.5 2.0 2.5 3.0 distance Bessel Mixtures - Random Phi s gamma(d) 0 2 4 6 8 Two Three Four Five 0.0 0.5.0.5 2.0 2.5 3.0 distance 28 Frontiers of Statistical Decision Making and Bayesian Analysis

Bayesian principles Classical statistics: model parameters are fixed and unknown. Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. March 3, 200 A Bayesian thinks of parameters as random, and thus having distributions (just like the data). We can thus think about unknowns for which no reliable frequentist experiment exists, e.g. θ = proportion of US men with untreated prostate cancer. A Bayesian writes down a prior guess for parameter(s) θ, say p(θ). He then combines this with the information provided by the observed data y to obtain the posterior distribution of θ, which we denote by p(θ y). All statistical inferences (point and interval estimates, hypothesis tests) then follow from posterior summaries. For example, the posterior means/medians/modes offer point estimates of θ, while the quantiles yield credible intervals. Bayesian principles Basics of Bayesian inference The key to Bayesian inference is learning or updating of prior beliefs. Thus, posterior information prior information. Is the classical approach wrong? That may be a controversial statement, but it certainly is fair to say that the classical approach is limited in scope. The Bayesian approach expands the class of models and easily handles: repeated measures unbalanced or missing data nonhomogenous variances multivariate data and many other settings that are precluded (or much more complicated) in classical settings. We start with a model (likelihood) f(y θ) for the observed data y = (y,..., y n ) given unknown parameters θ (perhaps a collection of several parameters). Add a prior distribution p(θ λ), where λ is a vector of hyper-parameters. The posterior distribution of θ is given by: p(θ y, λ) = p(θ λ) f(y θ) p(y λ) We refer to this formula as Bayes Theorem. = p(θ λ) f(y θ) f(y θ)p(θ λ)dθ. 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Basics of Bayesian inference Calculations (numerical and algebraic) are usually required only up to a proportionaly constant. We, therefore, write the posterior as: p(θ y, λ) p(θ λ) f(y θ). If λ are known/fixed, then the above represents the desired posterior. If, however, λ are unknown, we assign a prior, p(λ), and seek: p(θ, λ y) p(λ)p(θ λ)f(y θ). The proportionality constant does not depend upon θ or λ: p(y) = p(λ)p(θ λ)f(y θ)dλdθ The above represents a joint posterior from a hierarchical model. The marginal posterior distribution for θ is: p(θ y) = p(λ)p(θ λ)f(y θ)dλ. Bayesian inference: point estimation Point estimation is easy: simply choose an appropriate distribution summary: posterior mean, median or mode. Mode sometimes easy to compute (no integration, simply optimization), but often misrepresents the middle of the distribution especially for one-tailed distributions. Mean: easy to compute. It has the opposite effect of the mode chases tails. Median: probably the best compromise in being robust to tail behaviour although it may be awkward to compute as it needs to solve: θmedian p(θ y)dθ = 2. 5 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis 2

Bayesian inference: interval estimation The most popular method of inference in practical Bayesian modelling is interval estimation using credible sets. A 00( α)% credible set C for θ is a set that satisfies: P (θ C y) = p(θ y)dθ α. C The most popular credible set is the simple equal-tail interval estimate (q L, q U ) such that: ql p(θ y)dθ = α 2 = p(θ y)dθ Then clearly P (θ (q L, q U ) y) = α. This interval is relatively easy to compute and has a direct interpretation: The probability that θ lies between (q L, q U ) is α. The frequentist interpretation is extremely convoluted. 7 Frontiers of Statistical Decision Making and Bayesian Analysis q U A simple example: Normal data and normal priors Example: Consider a single data point y from a Normal distribution: y N(θ, σ 2 ); assume σ is known. f(y θ) = N(y θ, σ 2 ) = σ 2π exp( 2σ 2 (y θ)2 ) θ N(µ, τ 2 ), i.e. p(θ) = N(θ µ, τ 2 ); µ, τ 2 are known. Posterior distribution of θ p(θ y) N(θ µ, τ 2 ) N(y θ, σ 2 ) ( τ = N θ 2 + µ + σ 2 τ 2 ( = N ) σ 2 + y, + σ 2 τ 2 σ 2 τ 2 σ 2 θ σ 2 + τ 2 µ + τ 2 σ 2 + τ 2 y, σ 2 τ 2 ) σ 2 + τ 2. 8 Frontiers of Statistical Decision Making and Bayesian Analysis A simple example: Normal data and normal priors Another simple example: The Beta-Binomial model Interpret: Posterior mean is a weighted mean of prior mean and data point. The direct estimate is shrunk towards the prior. What if you had n observations instead of one in the earlier set up? Say y = (y,..., y n ) iid, where y i N(0, σ 2 ). ( ) ȳ is a sufficient statistic for µ; ȳ N µ, σ2 n Posterior distribution of θ ) p(θ y) N(θ µ, τ 2 ) N (ȳ θ, σ2 n ( ) n τ = N θ 2 σ n + µ + 2 n + ȳ, n + σ 2 τ 2 σ 2 τ 2 σ 2 τ ( 2 σ 2 = N θ σ 2 + nτ 2 µ + nτ 2 σ 2 + nτ 2 ȳ, σ 2 τ 2 ) σ 2 + nτ 2 Example: Let Y be the number of successes in n independent trials. ( ) n P (Y = y θ) = f(y θ) = θ y ( θ) n y y Prior: p(θ) = Beta(θ a, b): p(θ) θ a ( θ) b. Prior mean: µ = a/(a + b); Variance ab/((a + b) 2 (a + b + )) Posterior distribution of θ p(θ y) = Beta(θ a + y, b + n y) 9 Frontiers of Statistical Decision Making and Bayesian Analysis 0 Frontiers of Statistical Decision Making and Bayesian Analysis Sampling-based inference We will compute the posterior distribution p(θ y) by drawing samples from it. This replaces numerical integration (quadrature) by Monte Carlo integration. One important advantage: we only need to know p(θ y) up to the proportionality constant. Suppose θ = (θ, θ 2 ) and we know how to sample from the marginal posterior distribution p(θ 2 y) and the conditional distribution P (θ θ 2, y). How do we draw samples from the joint distribution: p(θ, θ 2 y)? Sampling-based inference We do this in two stages using composition sampling: First draw θ (j) 2 p(θ 2 y), j =,... M. ( Next draw θ (j) p θ θ (j) 2 )., y This sampling scheme produces exact samples, {θ (j), θ(j) 2 }M j= from the posterior distribution p(θ, θ 2 y). Gelfand and Smith (JASA, 990) demonstrated automatic marginalization: {θ (j) }M j= are samples from p(θ y) and (of course!) {θ (j) 2 }M j= are samples from p(θ 2 y). In effect, composition sampling has performed the following integration : p(θ y) = p(θ θ 2, y)p(θ 2 y)dθ. Frontiers of Statistical Decision Making and Bayesian Analysis 3

Bayesian predictions Suppose we want to predict new observations, say ỹ, based upon the observed data y. We will specify a joint probability model p(ỹ, y, θ), which defines the conditional predictive distribution: p(ỹ y, θ) = p(ỹ, y, θ). p(y θ) Bayesian predictions follow from the posterior predictive distribution that averages out the θ from the conditional predictive distribution with respect to the posterior: p(ỹ y) = p(ỹ y, θ)p(θ y)dθ. This can be evaluated using composition sampling: First obtain: θ (j) p(θ y), j =,... M For j =,..., M sample ỹ (j) p(ỹ y, θ (j) ) The {ỹ (j) } M j= are samples from the posterior predictive distribution p(ỹ y). 3 Frontiers of Statistical Decision Making and Bayesian Analysis Some remarks on sampling-based inference Direct Monte Carlo: Some algorithms (e.g. composition sampling) can generate independent samples exactly from the posterior distribution. In these situations there are NO convergence problems or issues. Sampling is called exact. Markov Chain Monte Carlo (MCMC): In general, exact sampling may not be possible/feasible. MCMC is a far more versatile set of algorithms that can be invoked to fit more general models. Note: anywhere where direct Monte Carlo applies, MCMC will provide excellent results too. Convergence issues: There is no free lunch! The power of MCMC comes at a cost. The initial samples do not necessarily come from the desired posterior distribution. Rather, they need to converge to the true posterior distribution. Therefore, one needs to assess convergence, discard output before the convergence and retain only post-convergence samples. The time of convergence is called burn-in. Diagnosing convergence: Usually a few parallel chains are run from rather different starting points. The sample values are plotted (called trace-plots) for each of the chains. The time for the chains to mix together is taken as the time for convergence. Good news! All this is automated in WinBUGS. So, as users, we need to only configure how to specify good Bayesian models and implement them in WinBUGS. 4 Frontiers of Statistical Decision Making and Bayesian Analysis 4

Linear regression models: a Bayesian perspective Bayesian Linear Models Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. March 3, 200 Ingredients of a linear model include an n response vector y = (y,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x,..., x p ], assumed to have been observed without error. The linear model: y = Xβ + ɛ; ɛ N(0, σ 2 I) The linear model is the most fundamental of all serious statistical models encompassing: ANOVA: y is continuous, x i s are categorical REGRESSION: y is continuous, x i s are continuous ANCOVA: y is continuous, some x i s are continuous, some categorical. Unknown parameters include the regression parameters β and the variance σ 2. We assume X is observed without error and all inference is conditional on X. Linear regression models: a Bayesian perspective Bayesian regression with flat reference priors The classical unbiased estimates of the regression parameter β and σ 2 are ˆβ = (X T X) X T y; ˆσ 2 = n p (y X ˆβ) T (y X ˆβ). The above estimate of β is also a least-squares estimate. The predicted value of y is given by ŷ = X ˆβ = P X y where P X = X(X T X) X T. P X is called the projector of X. It projects any vector to the space spanned by the columns of X. The model residual is estimated as: ê = (y X ˆβ) T (y X ˆβ) = y T (I P X )y. For Bayesian analysis, we will need to specify priors for the unknown regression parameters β and the variance σ 2. Consider independent flat priors on β and log σ 2 : p(β) ; p(log(σ 2 )) or equivalently p(β, σ 2 ) σ 2. None of the above two distributions are valid probabilities (they do not integrate to any finite number). So why is it that we are even discussing them? It turns out that even if the priors are improper (that s what we call them), as long as the resulting posterior distributions are valid we can still conduct legitimate statistical inference on them. 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Marginal and conditional distributions Marginal and conditional distributions With a flat prior on β we obtain, after some algebra, the conditional posterior distribution: p(β σ 2, y) = N(β (X T X) X T y, σ 2 (X T X) ). The conditional posterior distribution of β would have been the desired posterior distribution had σ 2 been known. Since that is not the case, we need to obtain the marginal posterior distribution by integrating out σ 2 as: p(β y) = p(β σ 2, y)p(σ 2 y)dσ 2 Can we solve this integration using composition sampling? YES: if we can generate samples from p(σ 2 y)! So, we need to find the marginal posterior distribution of σ 2. With the choice of the flat prior we obtain: ) p(σ 2 (n p)s2 y) ( (σ 2 ) ( = IG (n p)/2+ exp σ 2 n p, 2 (n p)s2 2 2σ 2 ), where s 2 = ˆσ 2 = n p yt (I P X )y. This is known as an inverted Gamma distribution (also called a scaled chi-square distribution) IG(σ 2 (n p)/2, (n p)s 2 /2). In other words: [(n p)s 2 /σ 2 y] χ 2 n p (with n p degrees of freedom). A striking similarity with the classical result: The distribution of ˆσ 2 is also characterized as (n p)s 2 /σ 2 following a chi-square distribution. 5 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis 5

Composition sampling for linear regression Now we are ready to carry out composittion sampling from p(β, σ 2 y) as follows: Draw M samples from p(σ 2 y): ( ) n p σ 2(j) (n p)s2 IG, (n p), j =,... M 2 2 For j =,..., M, draw from p(β σ 2(j), y): β (j) N ((X T X) X T y, σ 2(j) (X T X) ) The resulting samples {β (j), σ 2(j) } M j= represent M samples from p(β, σ 2 y). {β (j) } M j= are samples from the marginal posterior distribution p(β y). This is a multivariate t density: p(β y) = Γ(n/2) (π(n p)) p/2 Γ((n p)/2) s 2 (X T X) " + (β ˆβ) T (X T X)(β ˆβ) (n p)s 2 # n/2. Composition sampling for linear regression The marginal distribution of each individual regression parameter β j is a non-central univariate t n p distribution. In fact, β j ˆβ j s (X T X) jj t n p. The 95% credible intervals for each β j are constructed from the quantiles of the t-distribution. The credible intervals exactly coicide with the 95% classical confidence intervals, but the intepretation is direct: the probability of β j falling in that interval, given the observed data, is 0.95. Note: an intercept only linear model reduces to the simple univariate N(ȳ µ, σ 2 /n) likelihood, for which the marginal posterior of µ is: µ ȳ s/ n t n. 7 Frontiers of Statistical Decision Making and Bayesian Analysis 8 Frontiers of Statistical Decision Making and Bayesian Analysis Bayesian predictions from the linear model Suppose we have observed the new predictors X, and we wish to predict the outcome ỹ. We specify p(ỹ, y θ) to be a normal distribution: ( ) ([ ] ) ỹ N β, σ y X X 2 I Note p(ỹ y, β, σ 2 ) = p(ỹ β, σ 2 ) = N(ỹ Xβ, σ 2 I). The posterior predictive distribution: p(ỹ y) = p(ỹ y, β, σ 2 )p(β, σ 2 y)dβdσ 2 = p(ỹ β, σ 2 )p(β, σ 2 y)dβdσ 2. By now we are comfortable evaluating such integrals: First obtain: (β (j), σ 2(j) ) p(β, σ 2 y), j =,..., M Next draw: ỹ (j) N( Xβ (j), σ 2(j) I). 9 Frontiers of Statistical Decision Making and Bayesian Analysis The Gibbs sampler Suppose that θ = (θ, θ 2 ) and we seek the posterior distribution p(θ, θ 2 y). For many interesting hierarchical models, we have access to full conditional distributions p(θ θ 2, y) and p(θ θ 2, y). The Gibbs sampler proposes the following sampling scheme. Set starting values θ (0) = (θ (0), θ(0) 2 ) For j =,..., M Draw θ (j) p(θ θ (j ) 2, y) Draw θ (j) 2 p(θ 2 θ (j), y) This constructs a Markov Chain and, after an initial burn-in period when the chains are trying to find their way, the above algorithm guarantees that {θ (j), θ(j) 2 }M j=m will be samples from p(θ 0+, θ 2 y), where M 0 is the burn-in period.. 0 Frontiers of Statistical Decision Making and Bayesian Analysis The Gibbs sampler The Gibbs sampler More generally, if θ = (θ,..., θ p ) are the parameters in our model, we provide a set of initial values θ (0) = (θ (0),..., θ(0) p ) and then performs the j-th iteration, say for j =,..., M, by updating successively from the full conditional distributions: θ (j) p(θ (j) θ (j ) 2,..., θ p (j ), y) θ (j) 2 p(θ 2 θ (j), θ(j) 3,..., θ(j ) p, y)... (the generic k th element) p(θ k θ (j),..., θ(j) k, θ(j) k+,..., θ(j ) p, y) θ (j) k θ (j) p p(θ p θ (j),..., θ(j) p, y) Example: Consider the linear model. Suppose we set p(σ 2 ) = IG(σ 2 a, b) and p(β). The full conditional distributions are: p(β y, σ 2 ) = N(β (X T X) X T y, σ 2 (X T X) ) ( p(σ 2 y, β) = IG σ 2 a + n/2, b + ) 2 (y Xβ)T (y Xβ). Thus, the Gibbs sampler will initialize (β (0), σ 2(0) ) and draw, for j =,..., M: Draw β (j) N((X( T X) X T y, σ 2(j ) (X T X) ) ) Draw σ 2(j) IG a + n/2, b + 2 (y Xβ(j) ) T (y Xβ (j) ) Frontiers of Statistical Decision Making and Bayesian Analysis 6

The Gibbs sampler The Metropolis-Hastings algorithm In principle, the Gibbs sampler will work for extremely complex hierarchical models. The only issue is sampling from the full conditionals. They may not be amenable to easy sampling when these are not in closed form. A more general and extremely powerful - and often easier to code - algorithm is the Metropolis-Hastings (MH) algorithm. This algorithm also constructs a Markov Chain, but does not necessarily care about full conditionals. The Metropolis-Hastings algorithm: Start with a initial value for θ = θ (0). Select a candidate or proposal distribution from which to propose a value of θ at the j-th iteration: θ (j) q(θ (j ), ν). For example, q(θ (j ), ν) = N(θ (j ), ν) with ν fixed. Compute r = p(θ y)q(θ (j ) θ, ν) p(θ (j ) y)q(θ θ (j ) ν) If r then set θ (j) = θ. If r then draw U (0, ). If U r then θ (j) = θ. Otherwise, θ (j) = θ (j ). Repeat for j =,... M. This yields θ (),..., θ (M), which, after a burn-in period, will be samples from the true posterior distribution. It is important to monitor the acceptance ratio r of the sampler through the iterations. Rough recommendations: for vector updates r 20%., for scalar updates r 40%. This can be controlled by tuning ν. Popular approach: Embed Metropolis steps within Gibbs to draw from full conditionals that are not accessible to directly generate from. 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis The Metropolis-Hastings algorithm Example: For the linear model, our parameters are (β, σ 2 ). We write θ = (β, log(σ 2 )) and, at the j-th iteration, propose θ N(θ (j ), Σ). The log transformation on σ 2 ensures that all components of θ have support on the entire real line and can have meaningful proposed values from the multivariate normal. But we need to transform our prior to p(β, log(σ 2 )). Let z = log(σ 2 ) and assume p(β, z) = p(β)p(z). Let us derive p(z). REMEMBER: we need to adjust for the jacobian. Then p(z) = p(σ 2 ) dσ 2 /dz = p(e z )e z. The jacobian here is e z = σ 2. Let p(β) = and an p(σ 2 ) = IG(σ 2 a, b). Then log-posterior is: (a + n/2 + )z + z e z {b + 2 (Y Xβ)T (Y Xβ)}. A symmetric proposal distribution, say q(θ θ (j ), Σ) = N(θ (j ), Σ), cancels out in r. In practice it is better to compute log(r): log(r) = log(p(θ y) log(p(θ (j ) y)). For the proposal, N(θ (j ), Σ), Σ is a d d variance-covariance matrix, and d = dim(θ) = p +. If log r 0 then set θ (j) = θ. If log r 0 then draw U (0, ). If U r (or log U log r) then θ (j) = θ. Otherwise, θ (j) = θ (j ). Repeat the above procedure for j =,... M to obtain samples θ (),..., θ (M). 5 Frontiers of Statistical Decision Making and Bayesian Analysis 7

Spatial Domain Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. March 3, 200 y 0 2 4 6 8 0 0 2 4 6 8 0 x Algorithmic Modelling What is a spatial process? Spatial surface observed at finite set of locations S = {s, s 2,..., s n } x Tessellate the spatial domain (usually with data locations as vertices) x Fit an interpolating polynomial: Y(s) Y(s2) f(s) = i w i (S ; s)f(s i ) Y(sn) Interpolate by reading off f(s 0 ). Issues: Sensitivity to tessellations Choices of multivariate interpolators Numerical error analysis x x 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Simple linear model Simple linear model Simple linear model Simple linear model Y (s) = µ(s) + ɛ(s), Response: Y (s) at location s Mean: µ = x T (s)β Error: ɛ(s) iid N(0, τ 2 ) Assumptions regarding ɛ(s): ɛ(s) iid N(0, τ 2 ) Y (s) = µ(s) + ɛ(s), D D Y (s), x(s) ɛ(s i) ɛ(s j) 5 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis 8

Simple linear model Sources of variation Simple linear model Assumptions regarding ɛ(s): ɛ(s) iid N(0, τ 2 ) Y (s) = µ(s) + ɛ(s), ɛ(s i ) and ɛ(s j ) are uncorrelated for all i j Spatial Gaussian processes (GP ): Say w(s) GP (0, σ 2 ρ( )) and Cov(w(s ), w(s 2 )) = σ 2 ρ (φ; s s 2 ) D D ɛ(s i) ɛ(s j) w(s i) w(s j) 6 Frontiers of Statistical Decision Making and Bayesian Analysis 7 Frontiers of Statistical Decision Making and Bayesian Analysis Sources of variation Sources of variation Spatial Gaussian processes (GP ): Say w(s) GP (0, σ 2 ρ( )) and Cov(w(s ), w(s 2 )) = σ 2 ρ (φ; s s 2 ) Let w = [w(s i )] n i=, then Realization of a Gaussian process: Changing φ and holding σ 2 = : w N(0, σ 2 R(φ)), where R(φ) = [ρ(φ; s i s j )] n i,j= w N(0, σ 2 R(φ)), where R(φ) = [ρ(φ; s i s j )] n i,j= D w(s i) w(s j) 7 Frontiers of Statistical Decision Making and Bayesian Analysis 8 Frontiers of Statistical Decision Making and Bayesian Analysis Sources of variation Sources of variation Realization of a Gaussian process: Changing φ and holding σ 2 = : w N(0, σ 2 R(φ)), where R(φ) = [ρ(φ; s i s j )] n i,j= Realization of a Gaussian process: Changing φ and holding σ 2 = : w N(0, σ 2 R(φ)), where R(φ) = [ρ(φ; s i s j )] n i,j= Correlation model for R(φ): e.g., exponential decay ρ(φ; t) = exp( φt) if t > 0. Correlation model for R(φ): e.g., exponential decay ρ(φ; t) = exp( φt) if t > 0. Other valid models e.g., Gaussian, Spherical, Matérn. Effective range, t 0 = ln(0.05)/φ 3/φ 8 Frontiers of Statistical Decision Making and Bayesian Analysis 8 Frontiers of Statistical Decision Making and Bayesian Analysis 9

Sources of variation w N(0, σ 2 wr(φ)) defines complex spatial dependence structures. E.g., anisotropic Matérn correlation function: ρ(s i, s j; φ) = `/Γ(ν)2 ν 2 p νd ij) ν κ p ν(2 νd ij, where d ij = (s i s j) Σ (s i s j), Σ = G(ψ)Λ 2 G(ψ). Thus, φ = (ν, ψ, Λ). Simple linear model + random spatial effects Y (s) = µ(s) + w(s) + ɛ(s), Simulated Predicted Response: Y (s) at some site Mean: µ = x T (s)β Spatial random effects: w(s) GP (0, σ 2 ρ(φ; s s 2 )) Non-spatial variance: ɛ(s) iid N(0, τ 2 ) 9 Frontiers of Statistical Decision Making and Bayesian Analysis 0 Frontiers of Statistical Decision Making and Bayesian Analysis Hierarchical modelling First stage: n y β, w, τ 2 N(Y (s i ) x T (s i )β + w(s i ), τ 2 ) i= Hierarchical modelling First stage: n y β, w, τ 2 N(Y (s i ) x T (s i )β + w(s i ), τ 2 ) i= Second stage: w σ 2, φ N(0, σ 2 R(φ)) Frontiers of Statistical Decision Making and Bayesian Analysis Frontiers of Statistical Decision Making and Bayesian Analysis Hierarchical modelling First stage: n y β, w, τ 2 N(Y (s i ) x T (s i )β + w(s i ), τ 2 ) i= Hierarchical modelling First stage: n y β, w, τ 2 N(Y (s i ) x T (s i )β + w(s i ), τ 2 ) i= Second stage: Second stage: w σ 2, φ N(0, σ 2 R(φ)) w σ 2, φ N(0, σ 2 R(φ)) Third stage: Priors on Ω = (β, τ 2, σ 2, φ) Third stage: Priors on Ω = (β, τ 2, σ 2, φ) Marginalized likelihood: y Ω N(Xβ, σ 2 R(φ) + τ 2 I) Frontiers of Statistical Decision Making and Bayesian Analysis Frontiers of Statistical Decision Making and Bayesian Analysis 20

Hierarchical modelling First stage: n y β, w, τ 2 N(Y (s i ) x T (s i )β + w(s i ), τ 2 ) i= Bayesian Computations Choice: Fit [y Ω] [Ω] or [y β, w, τ 2 ] [w σ 2, φ] [Ω]. Second stage: w σ 2, φ N(0, σ 2 R(φ)) Third stage: Priors on Ω = (β, τ 2, σ 2, φ) Marginalized likelihood: y Ω N(Xβ, σ 2 R(φ) + τ 2 I) Note: Spatial process parametrizes Σ: y = Xβ + ɛ, ɛ N (0, Σ), Σ = σ 2 R(φ) + τ 2 I Frontiers of Statistical Decision Making and Bayesian Analysis Bayesian Computations Choice: Fit [y Ω] [Ω] or [y β, w, τ 2 ] [w σ 2, φ] [Ω]. Conditional model: conjugate full conditionals for σ 2, τ 2 and w easier to program. Bayesian Computations Choice: Fit [y Ω] [Ω] or [y β, w, τ 2 ] [w σ 2, φ] [Ω]. Conditional model: conjugate full conditionals for σ 2, τ 2 and w easier to program. Marginalized model: need Metropolis or Slice sampling for σ 2, τ 2 and φ. Harder to program. But, reduced parameter space faster convergence σ 2 R(φ) + τ 2 I is more stable than σ 2 R(φ). Bayesian Computations Choice: Fit [y Ω] [Ω] or [y β, w, τ 2 ] [w σ 2, φ] [Ω]. Where are the w s? Interest often lies in the spatial surface w y. Conditional model: conjugate full conditionals for σ 2, τ 2 and w easier to program. Marginalized model: need Metropolis or Slice sampling for σ 2, τ 2 and φ. Harder to program. But, reduced parameter space faster convergence σ 2 R(φ) + τ 2 I is more stable than σ 2 R(φ). But what about R (φ)?? EXPENSIVE! 3 Frontiers of Statistical Decision Making and Bayesian Analysis 2

Where are the w s? Interest often lies in the spatial surface w y. They are recovered from [w y, X] = [w Ω, y, X] [Ω y, X]dΩ using posterior samples: Where are the w s? Interest often lies in the spatial surface w y. They are recovered from [w y, X] = [w Ω, y, X] [Ω y, X]dΩ using posterior samples: Obtain Ω (),..., Ω (G) [Ω y, X] For each Ω (g), draw w (g) [w Ω (g), y, X] 3 Frontiers of Statistical Decision Making and Bayesian Analysis 3 Frontiers of Statistical Decision Making and Bayesian Analysis Where are the w s? Residual plot: [w(s) y] Interest often lies in the spatial surface w y. They are recovered from [w y, X] = [w Ω, y, X] [Ω y, X]dΩ using posterior samples: Obtain Ω (),..., Ω (G) [Ω y, X] For each Ω (g), draw w (g) [w Ω (g), y, X] w mean of Y NOTE: With Gaussian likelihoods [w Ω, y, X] is also Gaussian. With other likelihoods this may not be easy and often the conditional updating scheme is preferred. Easting Northing 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Another look: [w(s) y] Another look: [w(s) y] w w Easting Northing Easting Northing 5 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis 22

Spatial Prediction Spatial Prediction Prediction: Summary of [Y (s) y] Often we need to predict Y (s) at a new set of locations { s 0,..., s m } with associated predictor matrix X. Sample from predictive distribution: [ỹ y, X, X] = [ỹ, Ω y, X, X]dΩ = [ỹ y, Ω, X, X] [Ω y, X]dΩ, y [ỹ y, Ω, X, X] is multivariate normal. Sampling scheme: Obtain Ω (),..., Ω (G) [Ω y, X] For each Ω (g), draw ỹ (g) [ỹ y, Ω (g), X, X]. Easting Northing 7 Frontiers of Statistical Decision Making and Bayesian Analysis 8 Frontiers of Statistical Decision Making and Bayesian Analysis Colorado data illustration Temperature residual map Modelling temperature: 507 locations in Colorado. Simple spatial regression model: Y (s) = x T (s)β + w(s) + ɛ(s) w(s) GP (0, σ 2 ρ( ; φ, ν)); ɛ(s) iid N(0, τ 2 ) Parameters 50% (2.5%,97.5%) Intercept 2.827 (2.3,3.866) [Elevation] -0.426 (-0.527,-0.333) Precipitation 0.037 (0.002,0.072) σ 2 0.34 (0.05,.245) φ 7.39E-3 (4.7E-3, 5.2E-3) Range 278.2 (38.8, 476.3) τ 2 0.05 (0.022, 0.092) Northing 400 4200 4300 4400 4500 300 200 00 0 00 200 300 Easting 9 Frontiers of Statistical Decision Making and Bayesian Analysis 20 Frontiers of Statistical Decision Making and Bayesian Analysis Elevation map Residual map with elev. as covariate Latitude 37 38 39 40 4 Northing 400 4200 4300 4400 4500 08 06 04 02 Longitude 300 200 00 0 00 200 300 Easting 2 Frontiers of Statistical Decision Making and Bayesian Analysis 2 23

Spatial Generalized Linear Models Hierarchical Modelling for non-gaussian Spatial Data Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. Often data sets preclude Gaussian modelling: Y (s) may not even be continuous Example: Y (s) is a binary or count variable species presence or absence at location s species abundance from count at location s continuous forest variable is high or low at location s Replace Gaussian likelihood by exponential family member March 3, 200 Diggle Tawn and Moyeed (998) Spatial Generalized Linear Models Spatial Generalized Linear Models Comments First stage: Y (s i ) are conditionally independent given β and w(s i ), so f(y(s i ) β, w(s i ), γ) equals h(y(s i ), γ) exp (γ[y(s i )η(s i ) ψ(η(s i ))]) where g(e(y (s i ))) = η(s i ) = x T (s i )β + w(s i ) (canonical link function) and γ is a dispersion parameter. Second stage: Model w(s) as a Gaussian process: w N(0, σ 2 R(φ)) Third stage: Priors and hyperpriors. No process for Y (s), only a valid joint distribution Not sensible to add a pure error term ɛ(s) We are modelling with spatial random effects Introducing these in the transformed mean encourages means of spatial variables at proximate locations to be close to each other Marginal spatial dependence is induced between, say, Y (s) and Y (s ), but observed Y (s) and Y (s ) need not be close to each other Second stage spatial modelling is attractive for spatial explanation in themean First stage spatial modelling more appropriate to encourage proximate observations to be close. 3 Frontiers of Statistical Decision Making and Bayesian Analysis 4 Frontiers of Statistical Decision Making and Bayesian Analysis Binary spatial regression: forest/non-forest from: We illustrate a non-gaussian model for point-referenced spatial data: Objective is to make pixel-level prediction of forest/non-forest across the domain. Finley, A.O., S. Banerjee, and R.E. McRoberts. (2008) A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environmental and Ecological Statistics, 5:24 258. 5 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis 24

Binary spatial regression: forest/non-forest We illustrate a non-gaussian model for point-referenced spatial data: Objective is to make pixel-level prediction of forest/non-forest across the domain. Data: Observations are from 500 georeferenced USDA Forest Service Forest Inventory and Analysis (FIA) inventory plots within a 32 km radius circle in MN, USA. The response Y (s) is a binary variable, with { if inventory plot is forested Y (s) = 0 if inventory plot is not forested Binary spatial regression: forest/non-forest We illustrate a non-gaussian model for point-referenced spatial data: Objective is to make pixel-level prediction of forest/non-forest across the domain. Data: Observations are from 500 georeferenced USDA Forest Service Forest Inventory and Analysis (FIA) inventory plots within a 32 km radius circle in MN, USA. The response Y (s) is a binary variable, with { if inventory plot is forested Y (s) = 0 if inventory plot is not forested Observed covariates include the coinsiding pixel values for 3 dates of 30 30 m resolution Landsat imagery. 6 Frontiers of Statistical Decision Making and Bayesian Analysis 6 Frontiers of Statistical Decision Making and Bayesian Analysis Binary spatial regression: forest/non-forest We fit a generalized linear model where Y (s i ) Bernoulli(p(s i )), logit(p(s i )) = x T (s i )β + w(s i ). Binary spatial regression: forest/non-forest We fit a generalized linear model where Y (s i ) Bernoulli(p(s i )), logit(p(s i )) = x T (s i )β + w(s i ). Assume vague flat for β, a Uniform(3/32, 3/0.5) prior for φ, and an inverse-gamma(2, ) prior for σ 2. 7 Frontiers of Statistical Decision Making and Bayesian Analysis 7 Frontiers of Statistical Decision Making and Bayesian Analysis Binary spatial regression: forest/non-forest We fit a generalized linear model where Y (s i ) Bernoulli(p(s i )), logit(p(s i )) = x T (s i )β + w(s i ). Assume vague flat for β, a Uniform(3/32, 3/0.5) prior for φ, and an inverse-gamma(2, ) prior for σ 2. Parameters updated with Metropolis algorithm using target log density: ln (p(ω Y )) σ a + + n 2 nx + i= ln `σ 2 σb Y (s i) x T (s i)β + w(s i) + ln(σ 2 ) + ln(φ φ a) + ln(φ b φ). σ ln ( R(φ) ) 2 2 2σ 2 wt R(φ) w nx ln + exp(x T (s i)β + w(s i) 7 Frontiers of Statistical Decision Making and Bayesian Analysis i= Posterior parameter estimates Parameter estimates (posterior medians and upper and lower 2.5 percentiles): Parameters Estimates: 50% (2.5%, 97.5%) Intercept (θ 0) 82.39 (49.56, 20.46) AprilTC (θ ) -0.27 (-0.45, -0.) AprilTC2 (θ 2) 0.7 (0.07, 0.29) AprilTC3 (θ 3) -0.24 (-0.43, -0.08) JulyTC (θ 4) -0.04 (-0.25, 0.7) JulyTC2 (θ 5) 0.09 (-0.0, 0.9) JulyTC3 (θ 6) 0.0 (-0.5, 0.6) OctTC (θ 7) -0.43 (-0.68, -0.22) OctTC2 (θ 8) -0.03 (-0.9, 0.4) OctTC3 (θ 9) -0.26 (-0.46, -0.07) σ 2.358 (0.39, 2.42) φ 0.0082 (0.00065, 0.0032) log(0.05)/φ (meters) 644.9 (932.33, 4606.7) Covariates and proximity to observed FIA plot will contribute to increase precision of prediction. 8 Frontiers of Statistical Decision Making and Bayesian Analysis 25

Frontiers of Statistical Decision Making and Bayesian Analysis Median of posterior predictive distributions 9 Frontiers of Statistical Decision Making and Bayesian Analysis 97.5%-2.5% range of posterior predictive distributions 0 0.6 0.4 cut point 0.0 0.2 F[P(Y(A) = )] 0.8.0 CDF of holdout area s posterior predictive distributions 0.0 0.2 0.4 0.6 0.8.0 P(Y(A) = ) Classification of 5 20 20 pixel areas (based on visual inspection of imagery) into non-forest (), moderately forest ( ), and forest (no marker). Frontiers of Statistical Decision Making and Bayesian Analysis 26 Frontiers of Statistical Decision Making and Bayesian Analysis