Overview of Spatial Statistics with Applications to fmri

with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016

Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example of application

Wise words to begin with The best way to account for spatial dependence is to avoid doing it. (Michael Stein)

Why Spatial Statistics for neurological data? 400 395 390 fmri intensity 385 380 375 370 365 360 20 40 60 80 100 120 140 time (s) Figure: Time series of an fmri scan in one voxel

The big picture Figure: fmri time series for neighboring voxels

Averaging? fmri intensity 400 395 390 385 380 375 370 365 360 20 40 60 80 100 120 140 time (s) 390 385 380 375 370 365 360 20 40 60 80 100 120 140 355 time (s) mean fmri intensity Figure: single voxel vs averaging

However... Figure: fmri time series for neighboring voxels, a discontinuous case

Spatial statistics: a different field Spatial statistics is different from temporal statistics because the questions we want to answer are different. In temporal statistics you want to forecast the future given the past. We want to know the distribution of Y t+k given {Y t,...,y 1 }. In spatial statistics, there is no future.

Some theory Let D R 2, then Y s where s R 2 is a Gaussian random field if {Y s1,...,y sn } is a multivariate (scalar) Gaussian random vector for every {s 1,...,s n } D. Here Y s is a point value, but there is also a theory for areal values (e.g. values for different counties).

Some theory Main assumption Y s = Y = Xβ +ε p f i (s)β i +ε s, or in matrix form i=1 ε N(0,K), Gaussian process. Let us assume, for now, that Xβ = µ1: constant mean.

Some theory, cont d A Gaussian random field is stationary if and only if E(Y s ) = E(Y s+u ) = µ, cov(y s,y s ) = cov(y s+u,y s +u) = K(u), and in particular var(y s ) = var(y s+u ) = K(0). K is called the covariance function.

Covariance function So K(u) is defined for all locations in D R 2, not fixed lags. If K(u) = K( u ) the random field is called isotropic: no preferred direction in R 2. Stationarity/isotropy almost never occurs, but it can be used as a reference for more complex dependence structures.

Valid covariance Not every function K(u) could be a covariance function for a random field. K(u) must be positive definite, i.e. for every {s 1,...,s n } and every {w 1,...,w n } we have that ( n ) n n var w i Y si = w i w i K(s i s i ) 0. i=1 i=1 i =1

Valid covariance, examples Here are some widely used covariance functions squared exponential (not Gaussian): K(u) = σ 2 e ( u /φ)2. σ 2 is the variance and φ is the range. more generally, α-exponential: K(u) = σ 2 e ( u /φ)α for 0 < α 2. spherical ( σ 2 1 3 K(u) = 2 ( u /φ)+ 1 ) 2 ( u /φ)3, u φ, 0, u > φ.

More valid covariances ( Rational quadratic: K(u) = σ 2 1+ u 2 2αφ 2 Matérn: ) α K(u) = σ 2 {2 ν 1 Γ(ν)} 1 ( u /φ) ν K ν ( u /φ). σ 2 is the variance, φ is the range and ν is the smoothness. Special cases 1 ν = 1/2 K(u) = σ 2 e( u /φ, 2 ν = 3/2 K(u) = σ 2 1+ u ) e u /φ, φ 3... 4 ν squared exponential.

Some examples C(u) 0.0 0.2 0.4 0.6 0.8 1.0 nu=0.5 k=1.5 nu=10 spherical 0.0 0.5 1.0 1.5 2.0 u Some examples of covariance functions, all of them are isotropic.

A realization 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 Figure: Realization of an isotropic model

Non-isotropic models A simple generalization: a non-isotropic model, but still stationary: K(u) is not a function of u. Geometrically anisotropic: K(u) = K( Au ), where A is an affine matrix. A is controlled by distortion over the two axis (λ 1,λ 2 ) and rotation θ. Does not require any new theory: just modify the existing covariance functions Applications: Diffusion Tensor Imaging (DTI), water molecules parallel to the fiber tract

A realization Figure: Realization of a geometrically anisotropic model

Nonstationary In many applications, cov(y s,y s ) K(s s ): nonstationary. DTI: water molecules do not move at the same rate everywhere. Possibly the simplest construction (Fuentes, 2001) Y s = n i=1 w i (s)ỹ i s w i : weight function, decaying from a centroid c i. Ỹ i s: iid isotropic/anisotropic random fields.

Nonstationary, mixture model Figure: From an isotropic model to a locally anisotropic model

A realization Figure: From an isotropic model to a locally isotropic model

Nonstationary, convolution model A generalization (Fuentes and Smith, 2002): convolution Y s = w(s u)ỹs θ(u) du R 2 w: kernel function, θ(u) slowly varying function. Ỹ θ(u) s iid isotropic/anisotropic random fields.

Deformed random fields Y s is not isotropic, however we can assume Y f(s) (Sampson and Guttorp, 1992). Example: gravitational lensing in the cosmic microwave background radiation. f(s) : R 2 R 2 is a smooth invertible transformation So In practice two problems: cov(y s,y s ) = K( f(s) f(s ) ) 1 estimate f (from the physics) 2 estimate K (with statistics)

A realization Figure: From an isotropic model to a deformed model

What about interpolation? Why do you need a spatial model? Typical problem in spatial statistics: interpolate on different locations. We want to know the distribution of Y s given {Y s1,y s2,...,y sn }. Kriging. For fmri data, interpolation is not the primary concern: we are measuring all the voxels at the same time. Y = Xβ +ε environmental statistics is focused on interpolation, neuroimaging on inference on β....so why should we care?

Simulation study Simulation for every ROI empirical covariance for ε(t) assume a constant mean across ROI inactive mean 100 simulations, test for activation for different models

Simulation study, cont d Table: False Positives (nominal 5%) for three regions and the mean across ROI. ROI ind iso aniso l-aniso SOG, Right 80 26 28 13 PHG, Right 72 14 12 11 ORBsup, Left 64 35 31 27 mean 78.7 31.9 28.3 26.3

Power curve 1 0.95 0.9 power 0.85 0.8 0.75 glm iso aniso l-aniso 0.7 0.65 0 0.05 0.1 0.15 0.2 β Figure: Power curve.

Inference for spatial processes Now we have a spatial model, which implies a covariance matrix K(θ). We need to do inference! Likelihood L(θ Y) = (2π) n/2 K(θ) 1/2 exp { 1 } 2 (Y Xβ) K(θ) 1 (Y Xβ).

The likelihood and log likelihood functions Loglikelihood: l(θ Y) = lnl(θ Y) = n 2 ln(2π) 1 2 ln K(θ) (Y Xβ) K(θ) 1 (Y Xβ). The real trouble is K(θ). Storing it requires O(n 2 ) space and O(n 3 ) flops for computing the log determinant and matrix inversion.

Computational issues You need to store the matrix first. If you have a 150,000 voxels, the total matrix size would be (8 150,000) 2 /(1024) 4 1.3Tb. We can exploit the structure of K(θ) (symmetric and positive definite) to improve the storage, but This will not solve the problem (650 Gb) Most programming languages do not allow (yet) linear algebra operations with symmetric storage. However... voxels are on a regular grid. Let us assume our model is stationary.

Whittle approximation, the basic idea

Circulant matrix K(θ) = K(0) K(1) K(2) K(3) K(n 1) K(n 1) K(0) K(1) K(2) K(n 2) K(n 2) K(n 1) K(0) K(1) K(n 3) K(n 3) K(n 2) K(n 1) K(0) K(n 4) K(1) K(2) K(3) K(4) K(0) Every row consists of a circular shift of 1 element of the previous row. This type of matrix is called circulant, and it has some very convenient properties.

Whittle approximation In matrix form, K(θ) = D Λ(θ)D. D is the DFT matrix, Λ(θ) is a diagonal matrix with the Fourier coefficients. l(θ Y) = lnl(θ Y) n 2 ln(2π) 1 n 1 lnλ j (θ) 2 n 1 j=0 FFT(Y Xβ) 2 j/λ j (θ). j=0 This is called the Whittle approximation. You don t need to store any large matrix: just compute the FFT of the data. Storage O(n 2 ) O(n), flops O(n 3 ) O(nlogn).

An example of application One healthy control in a clinical study with 15 stroke patients and 12 healthy control subjects. Each session has 48 consecutive scans (every 2 seconds), task: hand grasping Rest Task Rest Task 12 12 12 12 Time (TR) Three sessions, T = 144 time points. 150,000 voxels in a 3D space (2mm 2mm 2mm) for each time. Total of 22 million data points.

Preliminaries Response: Each voxel v belongs to a Region of Interest (ROI) r. The fmri intensity at v is Y v;r (t). Y(t) = {Y v1 ;r v1 (t),...,y vv ;r vv (t)}, Y = {Y(1),...,Y(T)}. Covariates: Rest Task Rest Task 12 12 12 12 Time (TR) I 1 (t): indicator for task. I 2 (t) = 1 I 1 (t): indicator for rest.

Preliminaries Hemodynamic response function h(t) known and common across voxels Figure: fmri intensity (blue), below I 1 (t) and X 1 (t) = (h I 1 )(t) (red)

The model The model is: Y = Xβ +ε, Mean structure ( for every voxel): contribution of X 1 (t) and X 2 (t) (β 1 and β 2 ) intercept session mean time effect

Neuroscience questions Is a voxel active?: is β 1 β 2 0 for some voxel v? Activation This is an hypothesis test, for each voxel. Standard approach: independent voxels linear regression (general linear model). Some more advanced: False Discovery Rate to bound false positives under (positive) dependence. Here: model spatial dependence, to obtain more accurate activation patterns.

A few notes Y = Xβ +ε In my model, β is fixed, all spatial information is in ε. In a classical Bayesian model β is a Gaussian Markov Random Field, and ε simple noise. All spatial information is in β. Combinations are possible: β s and a model for ε.

The spatial model Figure: The three-step spatial model.

Temporal dependence and multiple scales Vector AR(2) in time: ε(t) = Φ 1 ε(t 1)+Φ 2 ε(t 2)+S{ΩH 1 (t)+(i V Ω)H 2 (t)}. Φ i = {φ i;v } i = 1,2 diagonal S = {σ v } diagonal ΩH 1 (t)+(i V Ω)H 2 (t) unscaled innovations Ω: relative contribution of H 1 w.r.t. H 2

Results 600 time (in seconds) 50 100 150 200 250 (a) (b) 50 100 150 200 250 580 580 fmri intensity 560 540 520 550 data fitted 95% pred (c) (d) 560 540 520 580 500 560 450 540 400 520 350 500 300 50 100 150 200 250 50 100 150 200 250 480 Figure: Fit for four random voxels.

The spatial model Figure: Scheme of the model of the spatial part of the model

Modeling local dependence H 1 (v;r) independent for every ROI. Sum of geometrically anisotropic processes H 1 (v;r) = k H i 1(v;r)w i (v), i=1 where H i 1(v;r) are iid Gaussian processes with cov{h i 1(v;r),H i 1(v ;r)} = Matérn with anisotropic distance (in 3D!).

Results 10 5 1 0 loc anisotropic isotropic -1-2 BIC -3-4 -5-6 0 10 20 30 40 50 60 70 80 90 ROI number Figure: BIC for all 90 ROIs for the isotropic model and the locally anisotropic model.

Results Figure: Activation at 0.01% for independent voxels (top) and the model (bottom).

Results Table: Activated voxels (in %). ROI Active Inhibited Total ind Supp Motor Area L 53 14 67 29 Supp Motor Area R 35 11 46 28 mean 25 15 40 23

Conclusions Spatial statistics can help assessing activation, but more work is needed in defining appropriate models for fmri. Inference is hard ( 100,000 voxels), but not impossible: we are on a grid and there is a large literature in environmental statistics. A statistical analysis for a large data set requires a large computer. Where to inject spatial information? β vs ε. Someone s mean function is someone else s covariance function