Chapter 4 - Spatial processes R packages and software Lecture notes

Similar documents
Chapter 4 - Fundamentals of spatial processes Lecture notes

Summary STK 4150/9150

Chapter 4 - Fundamentals of spatial processes Lecture notes

Disease mapping with Gaussian processes

Density Estimation. Seungjin Choi

Integrated Non-Factorized Variational Inference

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

A short introduction to INLA and R-INLA

Lecture 13 Fundamentals of Bayesian Inference

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Big Data in Environmental Research

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Bayesian Hierarchical Models

Spatial point processes in the modern world an

Bayesian Regression Linear and Logistic Regression

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Bayesian Inference: Concept and Practice

David Giles Bayesian Econometrics

Statistícal Methods for Spatial Data Analysis

Big Data in Environmental Research

Introduction to Bayes and non-bayes spatial statistics

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Departamento de Economía Universidad de Chile

Aggregated cancer incidence data: spatial models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Chapter 3 - Temporal processes

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

The Bayesian approach to inverse problems

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Bayes: All uncertainty is described using probability.

GAUSSIAN PROCESS REGRESSION

Statistics & Data Sciences: First Year Prelim Exam May 2018

Approximate Inference Part 1 of 2

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Approximate Inference Part 1 of 2

Probabilistic Machine Learning

Findings from a Search for R Spatial Analysis Support. Donald L. Schrupp Wildlife Ecologist Colorado Division of Wildlife (Retired)

Gaussian with mean ( µ ) and standard deviation ( σ)

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Wrapped Gaussian processes: a short review and some new results

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Beyond MCMC in fitting complex Bayesian models: The INLA method

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

On Bayesian Computation

g-priors for Linear Regression

Spatial Statistics with Image Analysis. Lecture L08. Computer exercise 3. Lecture 8. Johan Lindström. November 25, 2016

CPSC 540: Machine Learning

Nonparametric Bayesian Methods (Gaussian Processes)

CS281A/Stat241A Lecture 22

Bayesian Interpretations of Regularization

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Efficient Likelihood-Free Inference

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Nonparameteric Regression:

Lecture : Probabilistic Machine Learning

COS513 LECTURE 8 STATISTICAL CONCEPTS

Community Health Needs Assessment through Spatial Regression Modeling

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Gaussian Process Regression

Hierarchical Modeling for Univariate Spatial Data

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Introduction to Bayesian Inference

FastGP: an R package for Gaussian processes

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Bayesian Networks in Educational Assessment

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Computer exercise INLA

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

STAT 518 Intro Student Presentation

13: Variational inference II

Introduction to Probabilistic Machine Learning

Approximate Bayesian Computation

an introduction to bayesian inference

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Estimation with Sparse Grids

Introduction to Gaussian Processes

STA 4273H: Sta-s-cal Machine Learning

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Performance of INLA analysing bivariate meta-regression and age-period-cohort models

WEB application for the analysis of spatio-temporal data

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

Spatial Misalignment

Modelling geoadditive survival data

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Bayesian Linear Models

STA 4273H: Statistical Machine Learning

Default Priors and Effcient Posterior Computation in Bayesian

Review of Probabilities and Basic Statistics

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Transcription:

TK4150 - Intro 1 Chapter 4 - Spatial processes R packages and software Lecture notes Odd Kolbjørnsen and Geir Storvik February 13, 2017

STK4150 - Intro 2 Last time General set up for Kriging type problems Introductory example Kriging case General case Change of support Spatial moving average models Construction Correlation function Non gaussian observations Monte Carlo Laplace approximation INLA

STK4150 - Intro 3 Today To day computations Software Computer R INLA Next timel lattice models

STK4150 - Intro 4 Software There is allways an existing software that does your job The challenge is to figure out how this works Even if the software does not do excatly what you want maybe it is good enough Reasons not to make your own computer code existing code is tested (less bugs) existing code is optimized (speed) often related to publications easier to get others to accept it Reasons to make your own computer code understand the methodology better improve/develop exsiting methodology combining with other techniques compete with existing computer code because you have too much spare time

STK4150 - Intro 5 Software options Wiki List of spatial analysis software.../wiki/list_of_spatial_analysis_software/ LuciadLightspeed,Open GeoDa, CrimeStatc, SaTScan, SAGA, IDRISI, Biodiverse, ERDAS IMAGINE, TerraLens ++ ArcGIS: Geoprocessing, visualization (GIS = Geographic information system) GeoBUGS: Bayesian analysis (WinBugs) GeoDa: Explantory Spatial Anaysis + STARS: Space-time SAS/STAT: spatial analysis (limited) SPLUS: SpatialStats Matlab: Spatial-statistics toolbox (fast but limited) GEOEAS, SGeMS, GSLIB,COHIBA, CRAVA ++ R

STK4150 - Intro 6 Spatial analysis in R CRAN (The Comprehensive R Archive Network) Classes for spatial data: sp, spacetime Handling spatial data: geosphere Reading and writing spatial data: rgdal Visualisation Point pattern analysis Geostatistics: geor, gstat, spatial Disease mapping and areal data analysis: INLA Spatial regression: nlme Ecological analysis install.packages(...) update.packages(...) library(...)

TK4150 - Intro 7 Hierarcical and hierarcical Bayesian approach Hierarchical model Variable Densities Notation in book Data model: Z p(z Y, θ) [Z Y, θ] Process model: Y p(y θ) [Y θ] Parameter: θ Simultaneous model: p(y, z θ) = p(z y, θ)p(y θ) Marginal model: L(θ) = p(z θ) = p(z, y θ)dy y Inference: θ = argmax θ L(θ) Bayesian approach: Include model on θ Variable Densities Notation in book Data model: Z p(z Y, θ) [Z Y, θ] Process model: Y p(y θ) [Y θ] Parameter model: θ p(θ) [θ] Simultaneous model: p(y, z, θ) Marginal model: p(z) = θ p(z, y θ)dydθ y Inference: θ = θ θp(θ z)dθ

INLA INLA = Integrated nested Laplace approximation (software) C-code, R-interface Can be installed by source("http://www.math.ntnu.no/inla/givemeinla.r") Both empirical Bayes and Bayes

Full presentation on the course web page STK4150 - Intro 9

STK4150 - Intro 10

STK4150 - Intro 11

STK4150 - Intro 12

STK4150 - Intro 13

STK4150 - Intro 14 Example - simulated data Assume Y (s) =β 0 + β 1 x(s) + σ δ δ(s), Z(s i ) =Y (s i ) + σ ε ε(s i ), {δ(s)} matern correlation {ε(s i )} independent Simulations: Simulation on 20 30 grid {x(s)} sinus curve in horizontal direction Parametervalues (β 0, β 1 ) = (2, 0.3), σ δ = 1, σ ε = 0.3 and (θ 1, θ 2 ) = (2, 3)

Latent models in INLA STK4150 - Intro 15

STK4150 - Intro 16 Matern model - setup Code on webpage

Matern model - setup STK4150 - Intro 17

TK4150 - Intro 18 Process model mean and covariance Mean: 20 30 = 600, Covariance: 600 600 = 360000

Matern model - parametrization STK4150 - Intro 19

Matern model - parametrization STK4150 - Intro 20

STK4150 - Intro 21 Output INLA Empirical Bayes vs Bayes

STK4150 - Intro 22 Output INLA Empirical Bayes

STK4150 - Intro 23 Matern model - parametrization Model Range par Smoothness par Book { h /θ 1 } θ2 K θ2 ( h /θ 1 ) θ 1 θ 2 geor { h /φ} κ K κ ( h /φ) φ κ INLA { h κ} ν K ν ( h κ) 1/κ ν Note: INLA reports an estimate of another range, defined as 8 r = κ corresponding to a distance where the covariance function is approximately zero. An estimate of the range parameter can then be obtained by 1 κ = r 8 Note: INLA works with precisions τ y = σy 2, τ z = σz 2.

STK4150 - Intro 24 Result - INLA - empirical Bayes Parameter True value Estimate Run 1 β 1 2.0 1.7991 β 2 0.3 0.3385 σ z 0.3 1 13.213 = 0.275 σ y 1.0 1 1.398 = 0.846 φ 1 2.0 4.901 8 = 1.7328

STK4150 - Intro 25 Spatial prediction Run 1 Prior mean True intensity Gaussian data Empirical Bayes and Bayesian prediction

TK4150 - Intro 26 Result - INLA - empirical Bayes Parameter True value Estimate Run 1 Estimate Run 2 β 1 2.0 1.799 1.9719 β 2 0.3 0.3385 0.4741 σ z 0.3 1 13.213 = 0.275 1 13.6917 = 0.2756 σ y 1.0 1 1.398 = 0.846 1 0.0014 = 26.72* φ 1 2.0 4.901 8 = 1.7328 17.44 8 =6.16* * Commonly seen variance vs range tradeoff. Large variance and long range vs on scale variance and on scale range Allways compare your estimates with data variance. Here: Var(Z) = 1.03, theortically (Var(Z) = 1.09 = 1.0 + 0.3 2 ) This is an argument for a Bayesian approach with an informative prior

STK4150 - Intro 27 Spatial prediction Run 2 Prior mean True intensity Gaussian data Empirical Bayes and Bayesian prediction

STK4150 - Intro 28 Example - simulated data Assume Y (s) =β 0 + β 1 x(s) + σ δ δ(s), Z(s i ) Poisson(exp(Y (s i )) {δ(s)} matern correlation Simulations: Simulation on 20 30 grid {x(s)} sinus curve in horizontal direction Parametervalues (β 0, β 1 ) = (1, 0.3), σ δ = 1, σ ε = 0.3 and (θ 1, θ 2 ) = (2, 3)

STK4150 - Intro 29 Spatial prediction, Code Poisson Empirical Bayes (note constant term changed from 2 to 1, hence y-1 )

Parameter prediction, Poisson STK4150 - Intro 30

STK4150 - Intro 31 Spatial prediction, Poisson True intensity Poisson data (counts) Estimated intensity

STK4150 - Intro 32 Inla engine: GMRFLib Principle in GRMF = Gaussian Markov Random field is to invert the relation in smoothing of independent Gaussian variables Y = Lε, ε N(0, I) Y N(0, Σ = LL T )

STK4150 - Intro 33 Inla engine: GMRFLib Principle in GRMF = Gaussian Markov Random field is to invert the relation in smoothing of independent Gaussian variables Y = Lε, ε N(0, I) Y N(0, Σ = LL T ) Select the smooting kernel to be the cholesky factor: Y = Lε Then flip the equation and use the pressision matrix Q = Σ 1 : L 1 Y = ε, ε N(0, I) Y N(0, Q = L 1 L T ) The formulations are equivalent but L and L 1 differs greately. This has major impact on computations.

TK4150 - Intro 34 Choleskey versus inverse choleskey in AR(1) Inverse formulation : Direct formulation : Y 1 = σ 1 ε 1, Y i = αy i 1 + σε i ε 1 = 1 σ 1 Y 1, ε i = 1 σ Y i α σ Y i 1, i > 1 Y i = α i 1 σ 1 ε 1 + σ i α i j ε j The Inverse formulation can model long range dependency with a sparse matrix (few non-zero elements). j=2 Complexity inverse formulation, solving L 1 Y = ε: n Complexity direct formulation, multiply Y = Lε: n 2