E(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2

Size: px

Start display at page:

Download "E(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2"

Oliver Dorsey
5 years ago
Views:

1 1 Gaussian Processes Definition 1.1 A Gaussian process { i } over sites i is defined by its mean function and its covariance function E( i ) = µ i c ij = Cov( i, j ) plus joint normality of the finite dimensional distributions. Hence restricted to the points labelled by 1,..., n is ( 1,..., n ) T and it has a n-variate Gaussian distribution N(µ, Σ), where µ = (µ 1,..., µ n ) T and Σ = (c ij ). Note that the covariance function c ij must be positive definite (ie. any covariance matri created from a finite dimensional set of i s must be positive definite: a T Σa > 0, for any non-ero vector a). Covariance functions The restriction that the function {c ij } be positive definite can make the search for valid covariance functions difficult. Most covariance functions model covariance between sites i and j as a function of between the two sites d ij = dist(i, j), where dist(i, j) is typically Euclidean, or a simple modification of it. Hence c ij = C(d ij ). It is standard to choose from a number of parameteried covariance functions, often called covariograms, listed below: Power family C(d θ, p) = θ 1 ep{ d/θ 2 p }, 0 < p 2 Two notable covariograms in this family are the eponential (p = 1) and the Gaussian (p = 2). Spherical C(d θ) = { [ ( )] θ1 1 2 d π θ2 1 d θ 2 + sin 1 d θ 2 for d < θ 2 0 for d θ 2 For the spherical covariogram, if i and j are separated by a greater than θ 2, i and j are independent. Matérn C(d θ) = θ θ 3 1 Γ(θ 3 ) ( ) θ3 ( ) 2 θ3 d 2 θ3 d K θ3 where θ 2 is a scale parameter and θ 3 is a shape parameter, and K() θ is a modified Bessel function of the third kind of order θ 3 (Abramowit and Stegun 1964, Chapter 9). θ 2 θ 2 1

2 Why positive definite? Consider the power covariogram for large p. This makes the covariogram look like a step function C(d) = I[0 d 1]. So if sites 1,2,3 lie on a line with spacing 1, then Cov( 2 1, 2 ) = Cov( 2, 3 ) = 1 1 = 2 = 3, but C(d = 1) requires that Cov( 1, 3 ) = 0, which is a contradiction. Such a difficulty can occur for any p > 2. Eample: Gaussian random walk Let = ( 0, 1, 2, 3,...) T be a Gaussian process defined on the integers {0, 1, 2, 3,...} such that = ( 0, 1, 2, 3, 4 ) T has density: 0 0, i i 1 N( i 1, 1), for i = 1, 2, 3,... } 4 π() = (2π) 4 2 ep { 1 2 ( i i 1 ) 2, 0 = 0 i=1 ( ) = (2π) 4 2 ep 1 2 = (2π) 4 2 ep { 1 2 T W }, 0 = , 0 = 0 Alternatively, we can compute the means, variances and covariances of using standard covariance formulas since i = i j=1 j where the i s are iid N(0, 1) random variables. Thus: N 0, N(0, Σ) Note that this distribution is degenerate since 0 = 0. So the rank of Σ is only 4. One can check that W is a generalied-inverse of Σ. The covariance function of this process is: Cov( i, j ) = min(i, j) = i j One can imagine defining a random walk on a grid with twice the density using increments with half the variance. The resulting covariogram on the integers would be eactly the same. The figure below shows a random walk on a successively finer grid. 2

3 The limiting process { t } is called Brownian motion. It is defined over t 0, it s continuous, and it is characteried by its mean function E t = 0 and its covariance function Cov( t, s ) = t s. Note that the covariance function cannot be written as a function of here. In fact, Var t = t, so the process { t } isn t even stationary. A process { t } is said to be stationary if the distribution of a finite dimensional restriction of { t } is unchanged if the process is translated over space (ie. ( t1,..., tn ) = d ( t1 +s,..., tn+s) for any s in the case of Brownian motion). The random walk is an eample of an intrinsicly stationary process because the increments are all iid. Such processes can be characteried by their variograms, which give the variance of all pairwise differences. When the variance of any pair ( i, j ) depends only on the d ij between sites i and j we can characterie the process {} with a variogram denoted by 2γ(d ij ). For historical reasons γ(d) is called the semi-variogram, and the variogram is defined to be 2γ(d). For the random walk Var( i j ) = 2γ(d ij ) = i j In the eample here, knowing the variogram doesn t completely specify the distribution of {}. For eample, the random walk process where 0 = 0 has eactly the same variogram as does the process with 0 = 1. However, as long as one of the i s (or more) are known, then the distribution for the remaining components of {} is completely specified. The Variogram Definition 1.2 A Gaussian process { i }, is said to have a variogram if Var( i j ) is a function of d ij between sites i and j. We use 2γ(d ij ) to denote the variogram. 2γ(d ij ) = Var( i j ) If a process { i } has a covariogram, then the two functions are related by γ(d) = C(0) C(d) C(d) = γ( ) γ(h) 3

4 If { i } has a variogram, but the covariogram doesn t eist, we can compute a covariance function by conditioning on the event 0 = 0. 2γ(d ij ) = Var( i j ) = Var( i j 0 = 0) = Var(( i 0 ) ( j 0 ) 0 = 0) = Var( i 0 ) + Var( j 0 )) 2Cov( i 0, j 0 0 = 0) = 2γ(d i0 ) + 2γ(d j0 ) 2Cov( i, j 0 = 0) Thus one gets Cov( i, j 0 = 0) = γ(d i0 ) + γ(d j0 ) γ(d ij ) This is one way of obtaining Σ in the random walk eample. In fact using Σ ij = c γ(d ij ) will give a valid covariance matri provided c is sufficiently large. One eample of a Gaussian process which has a variogram, but not a covariogram is a generalied Brownian motion process: Power variogram: γ(d) d p, 0 < p 2 For the case p = 1, this is general Brownian motion. 2-d Thin plate spline γ(d) d 2 log d Sample realiations: 4

5 C(d) ep{ d/σ}, here σ = 2 eponential covariogram C(d) ep{ (d/σ) 2 }, here σ = 6 random realiation - Gaussian covariogram

6 C(d) ep{ (d/σ) 1. }, here σ = 9 random realiation - C(d) = ep(-d^1.) C(d) 1 2 d π σ 1 d σ + d sin 1 here σ = 9 σ random realiation - spherical

7 γ(d) d/σ, here σ = random realiation - Brownian motion p= γ(d) (d/σ) 1.3, here σ = random realiation - Brownian motion p=

8 γ(d) (d/σ) 1.7, here σ = random realiation - Brownian motion p= γ(d) (d/σ) 2.0, here σ = random realiation - Brownian motion p= Interpolation Given a set of points {1, 2,..., n} on a line or in a plane and their values 1, 2,..., n one can fit an interpolating surface just by choosing a covariogram or variogram model. 1. Assume the surface is a realiation from a Gaussian field with a specified covariance function. 8

9 2. Compute the conditional mean of the process given the observed sites. This is the interpolating surface. Here s the recipe: Suppose the process { 1,..., n } is observed. Now suppose you wish to find the interpolating surface { n+1,..., n+m } at the sites {n + 1,..., n + m}. Assume = ( 1,..., n, n+1,..., n+m ) T N (( µ1 µ 2 ) ( )) Σ11 Σ, 12 Σ 21 Σ 22 where the elements of Σ are determined by the covariogram or variogram model. The conditional distribution of ( n+1,..., n+m ) given 1,..., n is: ( 2 1 ) = (( n+1,..., n+m ) ( 1,..., n ) T ) N ( µ 2 + Σ 21 Σ 1 11 ( 1 µ 1 ), Σ 21 Σ 1 11 Σ 12 ) Note: if we re-epress things in terms of precisions, then for we have = ( 1,..., n, n+1,..., n+m ) T N ( ( µ1 µ 2 ) ( ) W11 W 1 ), 12 W 21 W 22 ( 2 1 ) = (( n+1,..., n+m ) ( 1,..., n ) T ) N ( µ 2 + W22 1 W ) 21( 1 µ 1 ), W22 1. Define the interpolant to be the mean of 2 given the observations 1 : µ 22 1 = µ 2 + Σ 21 Σ 1 11 ( 1 µ 1 ) If { i } really did follow a Guassian process with the specified mean and covariance function, then the standard error of the enterpolant at j would be the jj component of Σ 21 Σ 1 11 Σ 12. We ll look at estimating parameters of the covariogram/variogram later. The conditional mean is the optimal interpolator under mean squared error loss. Some eamples: 9

10 Gaussian C(r), scale = 2 Eponential C(r), scale = 1 Brownian motion C(r), p = 1. scale = Gaussian C(r), scale = 3 0 Eponential C(r), scale = 0 Brownian motion C(r), p = 1. scale = Gaussian C(r), scale = 0 Eponential C(r), scale = 0 Brownian motion C(r), p = 1. scale =

11 Gaussian C(r), scale = 2 Eponential C(r), scale = 1 Brownian motion C(r), p = 1. scale = Gaussian C(r), scale = 3-0 Eponential C(r), scale = - 0 Brownian motion C(r), p = 1. scale = Gaussian C(r), scale = - 0 Eponential C(r), scale = - 0 Brownian motion C(r), p = 1. scale =

12 a realiation mean conditional on =1 points realiation conditional on =1 points realiation conditional on =1 points

13 a realiation mean conditional on =1 points realiation conditional on =1 points 2 realiation conditional on =1 points A closer look at the covariogram There are a number of features of a variogram/covariogram that are worth noting. It is also of use to understand how properties of the covariogram affect the resulting spatial process. range scale nugget 13

14 nugget sill covariance range Eamples: White noise process Effects of the nugget on the conditional distribution Effects of the range on the conditional distribution Effects of γ (0+) on the conditional distribution 4 Estimating the variogram Given an observed set of points 1,..., n at locations 1,..., n one may wish to estimate various properties of the variogram governing their covariance. This can be done graphically using the empirical variogram. Definition 4.1 The empirical variogram is determined by discretiing into a n d bins and then estimating ˆγ(d) = 1 ( i j ) 2 2N d (i,j) d where d is the set of all pairs (i, j) such that the between i and j is within of d and N d is the number of pairs in d. Note this estimate can be somewhat unreliable. What follows is a number of surfaces and their empirical variograms derrived from 1,..., n, which were sampled uniformly from the 400 points making up a lattice. 14

15 Realiations from a Gaussian process with eponential variogram

16 Realiations from a Gaussian process with Gaussian variogram

17 Realiations from a Gaussian process with power (p=1.) variogram What features of γ(d) are important to capture Fitting a variogram model to the empirical variogram 1. Least squares: Choose θ so that (γ(d k θ) ˆγ(d k )) 2 is minimied. 2. Weighted least squares: Choose θ so that w k (γ(d k θ) ˆγ(d k )) 2 k k is minimied. Weights w k may be chosen to be proportional to the number of pairs in each bin N d. This will put more weight on pairs that are closer together. 17

18 3. Maimum likelihood: Suppose y N(µ, Σ(θ)), then we can estimate µ and θ by the values ˆµ and ˆθ that maimie the likelihood: L(µ, θ y) Σ(θ) 1 2 ep { 1 } 2 (y µ)t Σ(θ)(y µ) A variant of maimum likelihood is restricted maimum likelihood (REML) which uses a slightly modified version of the likelihood. 4. Bayesian estimation. Specify prior distributions for µ and θ and use the posterior mean or posterior mode to estimate θ. Anisotropy d = ΛRd. Transform euclidean thru a rotation R(θ) and a stretching/shrinking of the principle aes via Λ. These additional parameters may be absorbed in θ for estimating the variogram parameters. Eample: Piaa Road Superfund site. log dioin concentrations from Pilot Road site ylim lim One can estimate the variogram parameters by eye, using ML or REML, or better yet, a Bayesian approach. Use the interpolation formulas of Section 2 to estimate the concentration at unobserved sites. 18

19 Modeling spatial data y = β + + e where β absorbs standard linear model terms, absorbs the spatial trend, and e is a white noise term. eamples Agricultural field trials. rainfall estimation environmental monitoring imaging Observed data: y = (y 1,..., y n ) T Unobserved spatial trend: = ( 1,..., n ) T Covariates: Some equivalent formulations: y N(β +, σ 2 y I) y = β + + e y N(β, σ2 y I + Σ(θ)) N(0, Σ(θ)) N(0, Σ(θ)) e N(0, σ 2 ei) An eample: Researchers want to know how carbon concentrations in a stream differ on two different sides of a culvert on I-40. Concentrations are measured at 9 upstream locations and 8 downstream locations and are given in the figure below. Is there evidence that the culvert is associated with differences in carbon concentration? I

20 Here we can model the 17 measurements by y N(µ + α i, σ 2 ei + Σ(θ)) where α i, i = 1, 2, denotes the upstream or downstream measurements. So the question can be put in statistical terms, does α 1 = α 2? It turns out it depends on what covariance function is specified. If an eponential covriance function is used, then the inference depends on the specified range: significance by range: dist corr of nearest p-value diff*^ One could rely on maimum likelihood to specify the parameters of the eponential distribution. In fact it fits a covariance that is nearly all nugget effect (ie. no spatial dependence). However the uncertainty about the covariance parameters are unaccounted for.

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

C Stochastic fields Covariance Spatial Statistics with Image Analysis Lecture 2 Johan Lindström November 4, 26 Lecture L2 Johan Lindström - johanl@maths.lth.se FMSN2/MASM2 L /2 C Stochastic fields Covariance