Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Similar documents
Graphical Models for Collaborative Filtering

Statistical modeling of MODIS cloud data using the spatial random effects model

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Probabilistic Graphical Models

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

Factor Analysis and Kalman Filtering (11/2/04)

Geostatistical Modeling for Large Data Sets: Low-rank methods

State Space and Hidden Markov Models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

STA 4273H: Statistical Machine Learning

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

The Expectation Maximization Algorithm

Introduction to Machine Learning

MIXTURE MODELS AND EM

EM Algorithm II. September 11, 2018

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

VCMC: Variational Consensus Monte Carlo

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Estimating Gaussian Mixture Densities with EM A Tutorial

an introduction to bayesian inference

The Expectation-Maximization Algorithm

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Multivariate modelling and efficient estimation of Gaussian random fields with application to roller data

Statistical Tools and Techniques for Solar Astronomers

Time-Varying Parameters

Expectation Maximization

Lecture 14 Bayesian Models for Spatio-Temporal Data

Part 1: Expectation Propagation

On the Slow Convergence of EM and VBEM in Low-Noise Linear Models

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Robust Prediction of Large Spatio-Temporal Datasets

Tutorial on Fixed Rank Kriging (FRK) of CO 2 data. M. Katzfuss, The Ohio State University N. Cressie, The Ohio State University

p L yi z n m x N n xi

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

ABSTRACT INTRODUCTION

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Algorithmisches Lernen/Machine Learning

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

STA 414/2104: Machine Learning

Lecture 6: April 19, 2002

Managing Uncertainty

CPSC 540: Machine Learning

Probabilistic Graphical Models

A Gaussian state-space model for wind fields in the North-East Atlantic

Expectation Maximization

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Biostat 2065 Analysis of Incomplete Data

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Uncertainty quantification and visualization for functional random variables

Approximate Bayesian Computation and Particle Filters

Nearest Neighbor Gaussian Processes for Large Spatial Data

Statistics: Learning models from data

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Machine Learning Techniques for Computer Vision

Lecture 13 Fundamentals of Bayesian Inference

Gaussian Process Approximations of Stochastic Differential Equations

The Kalman Filter ImPr Talk

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 8: Bayesian Estimation of Parameters in State Space Models

1 Bayesian Linear Regression (BLR)

(Extended) Kalman Filter

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Expectation Propagation Algorithm

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

Previously Monte Carlo Integration

Default Priors and Effcient Posterior Computation in Bayesian

Phasing via the Expectation Maximization (EM) Algorithm

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Overlapping Astronomical Sources: Utilizing Spectral Information

EM & Variational Bayes

L11: Pattern recognition principles

Accelerating the EM Algorithm for Mixture Density Estimation

An introduction to Sequential Monte Carlo

Probabilistic Graphical Models

State Space Gaussian Processes with Non-Gaussian Likelihoods

Stochastic Spectral Approaches to Bayesian Inference

A Bayesian Spatio-Temporal Geostatistical Model with an Auxiliary Lattice for Large Datasets

Latent Variable Models and EM algorithm

Supplementary Note on Bayesian analysis

Mixtures of Gaussians. Sargur Srihari

Modeling Multiscale Differential Pixel Statistics

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Unsupervised Learning

Bayesian Estimation of Input Output Tables for Russia

Graphical Models for Statistical Inference and Data Assimilation

Multi-resolution models for large data sets

Transcription:

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University September 17, 2010 Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 1 / 23

Outline Outline 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 2 / 23

Outline Introduction: The STME Model 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 3 / 23

Notation Introduction: The STME Model Hidden spatio-temporal process y t (s) at time t and location s Measurements z t (s i,t ) = y t (s i,t ) + ɛ t (s i,t ) i = 1,..., n t t = 1,..., T In vector notation: z 1:T := [z 1,..., z T ], where z t := [z(s 1,t ),..., z(s nt,t)] Goal: Predict y t (s 0 ); t {1,..., T } Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 4 / 23

Introduction: The STME Model Motivating Example: Remote-Sensing Data Day 1 400 Example: Global satellite measurements of CO2 395 390 385 Challenges of global remote-sensing data: Massiveness Day 2 380 Need dimension reduction Sparseness Need to take advantage of spatial and temporal correlations Nonstationarity Need a flexible model 375 370 365 Day 3 360 355 350 Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 5 / 23

Introduction: The STME Model Spatio-Temporal Mixed Effects Model (Cressie et al., 2010) Process Model: y t (s) = x(s) β t + b(s) η t + γ t (s) x(s) β t : large-scale trend b(s) := [b 1 (s),..., b r (s)] : vector of known spatial basis functions η t = Hη t 1 + δ t ; t = 1, 2,... η 0 N r (0, K 0 ) δ t N r (0, U) γ t (s) N(0, σ 2 γv γ (s)): fine-scale variation Unknown parameters: θ := { {β t }, σ 2 γ, K 0, H, U } Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 6 / 23

Introduction: The STME Model Previous Approaches to Massive S-T Data Sets Many ad-hoc methods used outside the statistics literature (non-optimal, no measures of uncertainty) Other statistical spatio-temporal dimension-reduction models are less general (e.g., Nychka et al., 2002) STME model: Parameter estimation via binned-method-of-moments (Kang et al., 2010): Many arbitrary choices have to be made Estimates have to be modified to be valid Does not fully exploit temporal dependence in the data Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 7 / 23

Outline Parameter Estimation 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 8 / 23

Outline Parameter Estimation EM Estimation 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 9 / 23

Parameter Estimation Maximum-Likelihood Estimation EM Estimation Goal: Find ˆθ ML = arg max f (z 1:T θ) θ where recall z t = X t β t + B t η t + γ t + ɛ t Problem: Likelihood f (z 1:T θ) is quite complicated Solution: Expectation-maximization algorithm (Dempster et al., 1977) Maximization: Complete-data likelihood f (η 1:T, γ 1:T θ) is easy to maximize Expectation: E θ ( f (η 1:T, γ 1:T θ) z 1:T ) is obtained via FRS, a rapid sequential updating technique based on the Kalman filter (Kalman, 1960) Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 10 / 23

Parameter Estimation EM Estimation EM Estimation (Katzfuss & Cressie, 2010) The EM algorithm: Choose initial value θ [0] For l = 0, 1, 2,... (until convergence): 1. E-Step: Run FRS with θ [l] to obtain E θ [l]( f (η 1:T, γ 1:T θ) z 1:T ) 2. M-Step: θ [l+1] = arg max E θ [l]( f (η 1:T, γ 1:T θ) z 1:T ) θ 3. Go back to 1. Properties of the resulting estimates: Parameter estimates guaranteed to be valid Here, convergence to a (possibly local) maximum of the likelihood function Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 11 / 23

Outline Parameter Estimation Bayesian Estimation 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 12 / 23

Bayesian Inference Parameter Estimation Bayesian Estimation Parameters θ have a prior distribution Obtain posterior distribution of unknowns y t (s 0 ) and θ given the data z 1:T using Bayes Theorem In almost all cases, have to approximate posterior by sampling from it Shrinkage : Biased, but more efficient estimators Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 13 / 23

Priors and Posteriors Parameter Estimation Bayesian Estimation Prior distributions: Standard priors on {β t } and σ 2 γ Covariance matrices K 0 and U: Multiresolutional Givens-angle prior (Kang & Cressie, 2009) Control extreme eigenvalues Shrink off-diagonal elements toward zero Propagator matrix H: Shrink off-diagonal elements depending on how far corresponding basis functions are apart Posterior distribution: Samples of posterior distribution obtained using MCMC Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 14 / 23

Outline Application: Analysis of CO 2 Data 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 15 / 23

Application: Analysis of CO2 Data The Data Mid-tropospheric CO2 on May 1-4, 2003, as measured by AIRS (nt 14K ) Day 1 Day 2 400 395 390 385 380 375 Day 3 Day 4 370 365 360 355 350 Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 16 / 23

Application: Analysis of CO 2 Data Statistical Analysis Trend: x(s) = [1 lat(s)] Make predictions on a hexagonal grid of size 57, 065 for each day Basis functions: r = 380 bisquare functions at 3 spatial resolutions b(s) 0.0 0.2 0.4 0.6 0.8 1.0 Bisquare function in one dimension Res 1 Res 2 Res 3 1.0 0.5 0.0 0.5 1.0 s Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 17 / 23

EM Results Application: Analysis of CO 2 Data Predictions using EM Standard errors using EM EM computation time: 16 iterations one minute each = 16 min total Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 18 / 23

Application: Analysis of CO2 Data Bayesian Results Posterior means Posterior standard deviations 1,500 MCMC iterations 15 seconds each = 6.25 hours total Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 19 / 23

Application: Analysis of CO 2 Data Estimates of the Propagator Matrix H EM H B 50 1 50 1 100 0.5 100 0.5 150 150 200 0 200 0 250 300 0.5 250 300 0.5 350 1 350 1 50 100 150 200 250 300 350 50 100 150 200 250 300 350 Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 20 / 23

Outline Conclusions 1 Introduction: The STME Model 2 Parameter Estimation EM Estimation Bayesian Estimation 3 Application: Analysis of CO 2 Data 4 Conclusions Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 21 / 23

Conclusions Conclusions STME Model Scalable and flexible technique for analysis of massive, nonstationary spatio-temporal data sets Provides uncertainty quantification Here, successful use on CO 2 satellite data Parameter estimation: EM Estimation: Fast, easy Bayesian estimation: Better prediction ( 10% for AIRS data), more accurate uncertainty assessment Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 22 / 23

References Conclusions Cressie, N., Shi, T., & Kang, E. L. (2010). Fixed rank filtering for spatio-temporal data. Journal of Computational and Graphical Statistics. Forthcoming. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1 38. Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35 45. Kang, E. L., & Cressie, N. (2009). Bayesian inference for the spatial random effects model. Department of Statistics Technical Report No. 830. The Ohio State University. Kang, E. L., Cressie, N., & Shi, T. (2010). Using temporal variability to improve spatial mapping with application to satellite data. Canadian Journal of Statistics. Forthcoming. Katzfuss, M., & Cressie, N. (2010). Spatio-Temporal Smoothing and EM Estimation for Massive Remote-Sensing Data Sets. Department of Statistics Technical Report No. 840. The Ohio State University. Nychka, D. W., Wikle, C., & Royle, J. (2002). Multiresolution models for nonstationary spatial covariance functions. Statistical Modelling, 2, 315-331. Matthias Katzfuß (OSU Statistics) STME Parameter Estimation September 17, 2010 23 / 23