Overview of Spatial Statistics with Applications to fmri

Similar documents
Chapter 4 - Fundamentals of spatial processes Lecture notes

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Hierarchical Modeling for Univariate Spatial Data

Covariance function estimation in Gaussian process regression

Hierarchical Modelling for Univariate Spatial Data

The General Linear Model (GLM)

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Spatial smoothing using Gaussian processes

MIXED EFFECTS MODELS FOR TIME SERIES

Introduction to Geostatistics

Hierarchical Modelling for Univariate Spatial Data

Chapter 4 - Fundamentals of spatial processes Lecture notes

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Towards a Regression using Tensors

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Statistical Inference

The General Linear Model (GLM)

Spatial Statistics with Image Analysis. Lecture L08. Computer exercise 3. Lecture 8. Johan Lindström. November 25, 2016

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

State Space Representation of Gaussian Processes

HST 583 FUNCTIONAL MAGNETIC RESONANCE IMAGING DATA ANALYSIS AND ACQUISITION A REVIEW OF STATISTICS FOR FMRI DATA ANALYSIS

Chapter 3 - Temporal processes

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Statistical Inference

Data Analysis I: Single Subject

Gaussian Process Regression

Modelling temporal structure (in noise and signal)

Event-related fmri. Christian Ruff. Laboratory for Social and Neural Systems Research Department of Economics University of Zurich

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Basics of Point-Referenced Data Models

Nearest Neighbor Gaussian Processes for Large Spatial Data

Overview of SPM. Overview. Making the group inferences we want. Non-sphericity Beyond Ordinary Least Squares. Model estimation A word on power

9. Multivariate Linear Time Series (II). MA6622, Ernesto Mordecki, CityU, HK, 2006.

Ordinary Least Squares and its applications

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Reading Assignment. Distributed Lag and Autoregressive Models. Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1

ROI ANALYSIS OF PHARMAFMRI DATA:

Geostatistical Modeling for Large Data Sets: Low-rank methods

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

CBMS Lecture 1. Alan E. Gelfand Duke University

Parameter Estimation

Mixture of Gaussians Models

Multi-resolution models for large data sets

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Part 6: Multivariate Normal and Linear Models

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Statistics & Data Sciences: First Year Prelim Exam May 2018

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Switching Regime Estimation

Symmetry and Separability In Spatial-Temporal Processes

Jean-Baptiste Poline

Time Series Analysis

Introduction. Spatial Processes & Spatial Patterns

STAT Financial Time Series

1 Linear Difference Equations

Beyond Univariate Analyses: Multivariate Modeling of Functional Neuroimaging Data

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Piotr Majer Risk Patterns and Correlated Brain Activities

Cheng Soon Ong & Christian Walder. Canberra February June 2018

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Multivariate Gaussian Random Fields with SPDEs

Introduction to ARMA and GARCH processes

Data analysis of massive data sets a Planck example

Multivariate spatial models and the multikrig class

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Feb 21 and 25: Local weighted least squares: Quadratic loess smoother

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Empirical Market Microstructure Analysis (EMMA)

Group Analysis. Lexicon. Hierarchical models Mixed effect models Random effect (RFX) models Components of variance

Point-Referenced Data Models

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Bayesian Hierarchical Models

What s for today. Continue to discuss about nonstationary models Moving windows Convolution model Weighted stationary model

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

arxiv: v4 [stat.me] 14 Sep 2015

Paper Review: NONSTATIONARY COVARIANCE MODELS FOR GLOBAL DATA

A new covariance function for spatio-temporal data analysis with application to atmospheric pollution and sensor networking

Density Modeling and Clustering Using Dirichlet Diffusion Trees

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

11 : Gaussian Graphic Models and Ising Models

Chapter 4: Models for Stationary Time Series

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 424 Time Series Concepts

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Group analysis. Jean Daunizeau Wellcome Trust Centre for Neuroimaging University College London. SPM Course Edinburgh, April 2010

ECE521 week 3: 23/26 January 2017

Heteroskedasticity in Time Series

Transcription:

with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016

Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example of application

Wise words to begin with The best way to account for spatial dependence is to avoid doing it. (Michael Stein)

Why Spatial Statistics for neurological data? 400 395 390 fmri intensity 385 380 375 370 365 360 20 40 60 80 100 120 140 time (s) Figure: Time series of an fmri scan in one voxel

The big picture Figure: fmri time series for neighboring voxels

Averaging? fmri intensity 400 395 390 385 380 375 370 365 360 20 40 60 80 100 120 140 time (s) 390 385 380 375 370 365 360 20 40 60 80 100 120 140 355 time (s) mean fmri intensity Figure: single voxel vs averaging

However... Figure: fmri time series for neighboring voxels, a discontinuous case

Spatial statistics: a different field Spatial statistics is different from temporal statistics because the questions we want to answer are different. In temporal statistics you want to forecast the future given the past. We want to know the distribution of Y t+k given {Y t,...,y 1 }. In spatial statistics, there is no future.

Some theory Let D R 2, then Y s where s R 2 is a Gaussian random field if {Y s1,...,y sn } is a multivariate (scalar) Gaussian random vector for every {s 1,...,s n } D. Here Y s is a point value, but there is also a theory for areal values (e.g. values for different counties).

Some theory Main assumption Y s = Y = Xβ +ε p f i (s)β i +ε s, or in matrix form i=1 ε N(0,K), Gaussian process. Let us assume, for now, that Xβ = µ1: constant mean.

Some theory, cont d A Gaussian random field is stationary if and only if E(Y s ) = E(Y s+u ) = µ, cov(y s,y s ) = cov(y s+u,y s +u) = K(u), and in particular var(y s ) = var(y s+u ) = K(0). K is called the covariance function.

Covariance function So K(u) is defined for all locations in D R 2, not fixed lags. If K(u) = K( u ) the random field is called isotropic: no preferred direction in R 2. Stationarity/isotropy almost never occurs, but it can be used as a reference for more complex dependence structures.

Valid covariance Not every function K(u) could be a covariance function for a random field. K(u) must be positive definite, i.e. for every {s 1,...,s n } and every {w 1,...,w n } we have that ( n ) n n var w i Y si = w i w i K(s i s i ) 0. i=1 i=1 i =1

Valid covariance, examples Here are some widely used covariance functions squared exponential (not Gaussian): K(u) = σ 2 e ( u /φ)2. σ 2 is the variance and φ is the range. more generally, α-exponential: K(u) = σ 2 e ( u /φ)α for 0 < α 2. spherical ( σ 2 1 3 K(u) = 2 ( u /φ)+ 1 ) 2 ( u /φ)3, u φ, 0, u > φ.

More valid covariances ( Rational quadratic: K(u) = σ 2 1+ u 2 2αφ 2 Matérn: ) α K(u) = σ 2 {2 ν 1 Γ(ν)} 1 ( u /φ) ν K ν ( u /φ). σ 2 is the variance, φ is the range and ν is the smoothness. Special cases 1 ν = 1/2 K(u) = σ 2 e( u /φ, 2 ν = 3/2 K(u) = σ 2 1+ u ) e u /φ, φ 3... 4 ν squared exponential.

Some examples C(u) 0.0 0.2 0.4 0.6 0.8 1.0 nu=0.5 k=1.5 nu=10 spherical 0.0 0.5 1.0 1.5 2.0 u Some examples of covariance functions, all of them are isotropic.

A realization 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 Figure: Realization of an isotropic model

Non-isotropic models A simple generalization: a non-isotropic model, but still stationary: K(u) is not a function of u. Geometrically anisotropic: K(u) = K( Au ), where A is an affine matrix. A is controlled by distortion over the two axis (λ 1,λ 2 ) and rotation θ. Does not require any new theory: just modify the existing covariance functions Applications: Diffusion Tensor Imaging (DTI), water molecules parallel to the fiber tract

A realization Figure: Realization of a geometrically anisotropic model

Nonstationary In many applications, cov(y s,y s ) K(s s ): nonstationary. DTI: water molecules do not move at the same rate everywhere. Possibly the simplest construction (Fuentes, 2001) Y s = n i=1 w i (s)ỹ i s w i : weight function, decaying from a centroid c i. Ỹ i s: iid isotropic/anisotropic random fields.

Nonstationary, mixture model Figure: From an isotropic model to a locally anisotropic model

A realization Figure: From an isotropic model to a locally isotropic model

Nonstationary, convolution model A generalization (Fuentes and Smith, 2002): convolution Y s = w(s u)ỹs θ(u) du R 2 w: kernel function, θ(u) slowly varying function. Ỹ θ(u) s iid isotropic/anisotropic random fields.

Deformed random fields Y s is not isotropic, however we can assume Y f(s) (Sampson and Guttorp, 1992). Example: gravitational lensing in the cosmic microwave background radiation. f(s) : R 2 R 2 is a smooth invertible transformation So In practice two problems: cov(y s,y s ) = K( f(s) f(s ) ) 1 estimate f (from the physics) 2 estimate K (with statistics)

A realization Figure: From an isotropic model to a deformed model

What about interpolation? Why do you need a spatial model? Typical problem in spatial statistics: interpolate on different locations. We want to know the distribution of Y s given {Y s1,y s2,...,y sn }. Kriging. For fmri data, interpolation is not the primary concern: we are measuring all the voxels at the same time. Y = Xβ +ε environmental statistics is focused on interpolation, neuroimaging on inference on β....so why should we care?

Simulation study Simulation for every ROI empirical covariance for ε(t) assume a constant mean across ROI inactive mean 100 simulations, test for activation for different models

Simulation study, cont d Table: False Positives (nominal 5%) for three regions and the mean across ROI. ROI ind iso aniso l-aniso SOG, Right 80 26 28 13 PHG, Right 72 14 12 11 ORBsup, Left 64 35 31 27 mean 78.7 31.9 28.3 26.3

Power curve 1 0.95 0.9 power 0.85 0.8 0.75 glm iso aniso l-aniso 0.7 0.65 0 0.05 0.1 0.15 0.2 β Figure: Power curve.

Inference for spatial processes Now we have a spatial model, which implies a covariance matrix K(θ). We need to do inference! Likelihood L(θ Y) = (2π) n/2 K(θ) 1/2 exp { 1 } 2 (Y Xβ) K(θ) 1 (Y Xβ).

The likelihood and log likelihood functions Loglikelihood: l(θ Y) = lnl(θ Y) = n 2 ln(2π) 1 2 ln K(θ) (Y Xβ) K(θ) 1 (Y Xβ). The real trouble is K(θ). Storing it requires O(n 2 ) space and O(n 3 ) flops for computing the log determinant and matrix inversion.

Computational issues You need to store the matrix first. If you have a 150,000 voxels, the total matrix size would be (8 150,000) 2 /(1024) 4 1.3Tb. We can exploit the structure of K(θ) (symmetric and positive definite) to improve the storage, but This will not solve the problem (650 Gb) Most programming languages do not allow (yet) linear algebra operations with symmetric storage. However... voxels are on a regular grid. Let us assume our model is stationary.

Whittle approximation, the basic idea

Circulant matrix K(θ) = K(0) K(1) K(2) K(3) K(n 1) K(n 1) K(0) K(1) K(2) K(n 2) K(n 2) K(n 1) K(0) K(1) K(n 3) K(n 3) K(n 2) K(n 1) K(0) K(n 4) K(1) K(2) K(3) K(4) K(0) Every row consists of a circular shift of 1 element of the previous row. This type of matrix is called circulant, and it has some very convenient properties.

Whittle approximation In matrix form, K(θ) = D Λ(θ)D. D is the DFT matrix, Λ(θ) is a diagonal matrix with the Fourier coefficients. l(θ Y) = lnl(θ Y) n 2 ln(2π) 1 n 1 lnλ j (θ) 2 n 1 j=0 FFT(Y Xβ) 2 j/λ j (θ). j=0 This is called the Whittle approximation. You don t need to store any large matrix: just compute the FFT of the data. Storage O(n 2 ) O(n), flops O(n 3 ) O(nlogn).

An example of application One healthy control in a clinical study with 15 stroke patients and 12 healthy control subjects. Each session has 48 consecutive scans (every 2 seconds), task: hand grasping Rest Task Rest Task 12 12 12 12 Time (TR) Three sessions, T = 144 time points. 150,000 voxels in a 3D space (2mm 2mm 2mm) for each time. Total of 22 million data points.

Preliminaries Response: Each voxel v belongs to a Region of Interest (ROI) r. The fmri intensity at v is Y v;r (t). Y(t) = {Y v1 ;r v1 (t),...,y vv ;r vv (t)}, Y = {Y(1),...,Y(T)}. Covariates: Rest Task Rest Task 12 12 12 12 Time (TR) I 1 (t): indicator for task. I 2 (t) = 1 I 1 (t): indicator for rest.

Preliminaries Hemodynamic response function h(t) known and common across voxels Figure: fmri intensity (blue), below I 1 (t) and X 1 (t) = (h I 1 )(t) (red)

The model The model is: Y = Xβ +ε, Mean structure ( for every voxel): contribution of X 1 (t) and X 2 (t) (β 1 and β 2 ) intercept session mean time effect

Neuroscience questions Is a voxel active?: is β 1 β 2 0 for some voxel v? Activation This is an hypothesis test, for each voxel. Standard approach: independent voxels linear regression (general linear model). Some more advanced: False Discovery Rate to bound false positives under (positive) dependence. Here: model spatial dependence, to obtain more accurate activation patterns.

A few notes Y = Xβ +ε In my model, β is fixed, all spatial information is in ε. In a classical Bayesian model β is a Gaussian Markov Random Field, and ε simple noise. All spatial information is in β. Combinations are possible: β s and a model for ε.

The spatial model Figure: The three-step spatial model.

Temporal dependence and multiple scales Vector AR(2) in time: ε(t) = Φ 1 ε(t 1)+Φ 2 ε(t 2)+S{ΩH 1 (t)+(i V Ω)H 2 (t)}. Φ i = {φ i;v } i = 1,2 diagonal S = {σ v } diagonal ΩH 1 (t)+(i V Ω)H 2 (t) unscaled innovations Ω: relative contribution of H 1 w.r.t. H 2

Results 600 time (in seconds) 50 100 150 200 250 (a) (b) 50 100 150 200 250 580 580 fmri intensity 560 540 520 550 data fitted 95% pred (c) (d) 560 540 520 580 500 560 450 540 400 520 350 500 300 50 100 150 200 250 50 100 150 200 250 480 Figure: Fit for four random voxels.

The spatial model Figure: Scheme of the model of the spatial part of the model

Modeling local dependence H 1 (v;r) independent for every ROI. Sum of geometrically anisotropic processes H 1 (v;r) = k H i 1(v;r)w i (v), i=1 where H i 1(v;r) are iid Gaussian processes with cov{h i 1(v;r),H i 1(v ;r)} = Matérn with anisotropic distance (in 3D!).

Results 10 5 1 0 loc anisotropic isotropic -1-2 BIC -3-4 -5-6 0 10 20 30 40 50 60 70 80 90 ROI number Figure: BIC for all 90 ROIs for the isotropic model and the locally anisotropic model.

Results Figure: Activation at 0.01% for independent voxels (top) and the model (bottom).

Results Table: Activated voxels (in %). ROI Active Inhibited Total ind Supp Motor Area L 53 14 67 29 Supp Motor Area R 35 11 46 28 mean 25 15 40 23

Conclusions Spatial statistics can help assessing activation, but more work is needed in defining appropriate models for fmri. Inference is hard ( 100,000 voxels), but not impossible: we are on a grid and there is a large literature in environmental statistics. A statistical analysis for a large data set requires a large computer. Where to inject spatial information? β vs ε. Someone s mean function is someone else s covariance function