Computation fundamentals of discrete GMRF representations of continuous domain spatial models
|
|
- Alan Eaton
- 5 years ago
- Views:
Transcription
1 Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial statistics using continuous domain GMRF models are presented. The construction of the models as approimate SPDE solutions is not covered. The results are applicable to non-spatial GMRF models as well but the important property of observations being local and non-local is only discussed in the spatial contet. Contents 1 Basic spatial model in GMRF representation Bayesian hierarchical formulation Joint distribution Conditional/posterior distribution (or Kriging) Epectations covariances and samples Solving with the posterior precision Prior and posterior variances Leave-one-out cross-validation Sampling from the prior Sampling from the posterior Direct method Least squares Sampling with conditioning by Kriging Likelihood evaluation Conditional marginal data likelihood Posterior parameter likelihood and total marginal data likelihood A note on non-gaussian data and INLA Linear constraints Non-interacting hard constraints Conditioning by Kriging Conditional covariances Constrained likelihood evaluation
2 1 Basic spatial model in GMRF representation Basic linear spatial model: Spatial basis epansion model u(s) = m k=1 ψ k(s)u k with m basis functions ψ k (s) with local (compact) support and GMRF coefficients (u 1... u m ) = u N(µ u Q 1 u ) where Q u is derived from an SPDE construction with parameters θ u and µ u is usually 0. Covariate matri B p coefficients b N(µ b Q 1 b ). Typically µ b = 0 and Q b = τ b I p with τ b 0. n observations y = (y 1... y n ) at locations s i and additive zero mean Gaussian noise e i with (e 1... e n ) = ɛ N(0 Q 1 ɛ ). Typically Q ɛ = τ ɛ I. Define A so that A ij = ψ j (s i ). The observation model is then y = Bb + Au + ɛ. Since the covariate effects and the spatial field are jointly Gaussian it s often practical to join them into a single vector = (u b) and [ ] µu µ = µ b [ ] Qu 0 Q = 0 Q b 1.1 Bayesian hierarchical formulation A = [ A B ] y = A + ɛ. Written as a full Bayesian hierarchical model the model is θ = (θ u τ b τ ɛ ) π(θ) ( θ) N(µ Q 1 ) (y θ ) N(A Q 1 ɛ ). It will be important to note that for local basis functions (and at most Markov dependent noise ɛ) A Q ɛ A is sparse and that is therefore also sparse if p is small. 1.2 Joint distribution [ A A Q Q ɛ A = ɛ A A ] Q ɛ B B Q ɛ A B Q ɛ B The joint distribution for the observations and the latent variables (y ) is given by [ ( [A ] [ ] 1 ) y µ N Q ɛ Q ɛ A ] µ A Q ɛ Q + A Q ɛ A 2
3 1.3 Conditional/posterior distribution (or Kriging) The conditional distribution for given y and θ is given by ( θ y) N(µ y Q 1 y ) Q y = Q + A Q ɛ A µ y = µ + Q 1 y A Q ɛ (y A µ ). The proof is by completing the square in the eponent of the Gaussian conditional density epression π( θ y) = π( y θ) π(y θ) π( θ)π(y θ ). Note that the elements of µ y are the basis function coefficients and covariate effect estimates in the universal Kriging predictor and that we ve arrived at it without going the long way around via the traditional covariance based Kriging equations and matri inversion lemmas. 2 Epectations covariances and samples Direct calculations and simulation (for models smaller than approimately m = 10 6 ) is typically done with sparse Cholesky decomposition. Reordering methods are required to achieve sparsity (CAMD nested dissection etc.). Here assume for ease of notation that the nodes are already in a suitable order and that both the prior and posterior precision decompositions are sparse: Q = L L Q y = L y L y where all L are lower triangular sparse matrices. See Sections and 4 for eamples of how to handle models where only the prior precision has a sparse Cholesky decomposition. In several instances the needed adjustments for the use of iterative equation solvers are also mentioned. 2.1 Solving with the posterior precision Kriging requires solving a linear system Q y z = w z = L y ( L 1 y w ) requiring one forward solve and one backward solve. Several systems with the same posterior precision can be solved simultaneously by having a multi-column w. Often practical implementations avoid actually transposing the large sparse matrices effectively doing z = { ( ) } L 1 y w L 1 y instead. The Kriging predictor (the posterior epectation) is obtained for w = A Q ɛ (y A µ ) with µ y = µ + z. 3
4 2.2 Prior and posterior variances Even though the full covariance matri is too large to be stored the variances and covariances of neighbouring Markov nodes can be calculated using an algorithm related to the Cholesky decomposition. In R-INLA this is available as S=inla.qinv(Q) where {( Q 1 ) S ij = when Q ij ij 0 or (i j) is in the Cholesky infill and (1) 0 otherwise. This means that the posterior neighbour covariances Cov( i j θ y) for neighbours (i j) are obtained by calling inla.qinv() with the posterior precision Q y. Provided that the desired posterior predictions A pred the prior model itself e.g. when A pred marginal posterior predictive variances as the diagonal of A pred ( ) m+p Var (A pred ) i = have the same sparse connectivity structure as = A the sparse symmetric S matri is sufficient to evaluate S(A pred ) : j=1 (A pred ) ij m+p k=1 S jk (A pred ) ik which in R would typically be implemented as rowsums(a * (A %*% S)). When the desired predictions are dense relative to the model (e.g. when predicting the overall spatial average) solves with L y are required. A small number of predictions (i.e. the number of rows of A pred is small) can be evaluated simultaneously with LAt <- solve(l t(a)) colsums(lat * LAt) Data predictive marginal variances for A pred + ɛ pred are obtained by adding τɛ 1 to the variances for A pred. 2.3 Leave-one-out cross-validation In the case of conditionally independent observations epressions for epectations and variances of leaveone-out cross-validation predictive distributions can be obtained via rank one modification of the full posterior precision matri and are obtained for free by manipulating the above epressions. Let S y be the neighbour covariance matri obtained from (1) applied to Q y and define E i = (A µ y ) i V i = (A ) i S y (A ) i. Then the predictive distribution for i based on the complete data is ((A ) i θ y) N(E i V i ) and the leave-one-out predictive distribution is ( (A ) i θ y j j i ) N(E (i) V (i) ) where E (i) = y i y i E i 1 q i V i V (i) = V i 1 q i V i. where q i is the conditional precision of observation i from Q ɛ = diag(q 1... q n ). 4
5 2.4 Sampling from the prior Let w N(0 I m+p ) then = µ + L w is a sample from the prior model for. Multiple samples are obtained by repeated forward solves. 2.5 Sampling from the posterior Direct method First compute µ y as in Section 2.1 and then let w N(0 I m+p ). Then = µ y + L y w is a sample from the posterior for given y. Multiple samples are obtained by repeated forward solves Least squares In some models the posterior precision matri does not have a sparse Cholesky decomposition but iterative linear algebra methods may still be used to solve systems with Q y. If the prior precision can be decomposed as Q = H H for some sparse matri H (not necessarily a Cholesky factor) then one approach to sampling is least squares. Given that the posterior epectation has already been calculated using an iterative solution to the equations in Section 1.3 a sample of conditionally on y is given by the solution to where w N(0 I m+p+n ) Sampling with conditioning by Kriging Q y ( µ y ) = [ H A L ɛ ] w The conditioning by Kriging approach to sampling is to construct samples from the prior and then correct them to become samples from the posterior. For GMRF models this is less efficient than the direct approach in Section ecept when some observations are non-local which leads to a dense posterior precision. In its traditional form the samples are constructed as follows: = µ + L w w N(0 I m+p ) y = y + L ɛ w y w y N(0 I n ) ( ) 1 = Q 1 A A Q 1 A + Q 1 (A y ) where the final epression is the covariance based Kriging equation. For anything but very small n the inner matri is dense. Using the Woodbury identity the epression can be rewritten to recover the precision based equation from Section 1.3 ( ) 1 = + Q + A Q ɛ A A Q ɛ (y A ) = + Q 1 y A Q ɛ (y A ) ɛ 5
6 which is calculated as before with forward and backward solves on L y. If N samples as well as the posterior mean are desired conditioning by Kriging requires N solves with L and L ɛ (to obtain 0 and y 0 ) and 2 + 2N solves with L y. The direct method only requires 2 + N solves with L y and no solves with L or L ɛ which means that it is nearly always preferable to conditioning by Kriging. The eception is non-local observations which can be handled separately as soft constraints see Section 4. 3 Likelihood evaluation The epensive part of evaluating likelihoods is calculating precision determinants but they come at no etra cost given the Cholesky factors: log π( θ) = m + p log(2π) log det(q ) 1 2 ( µ ) Q ( µ ) m+p log det(q ) = 2 log(l ) jj. j=1 Corresponding epressions hold for π(y θ ) and π( θ y) using L ɛ and L y. Marginal data likelihoods require more effort. 3.1 Conditional marginal data likelihood For any arbitrary π(y θ) = = π(θ y) π(θ) = π( θ)π(y θ ) π( θ y) π(θ y)π( θ y) π(θ)π( θ y) = =. = π(θ y ) π(θ)π( θ y) = In practice this is evaluated for = µ y which gives better numerical stability than = 0. Written eplicitly log π(y θ) = n 2 log(2π) log det(q ) log det(q ɛ) 1 2 log det(q y) 1 2 (µ y µ ) Q (µ y µ ) 1 2 (y A µ y ) Q ɛ (y A µ y ) where an anticipated third quadratic form from π( θ y) has been eliminated due to the choice of. Comparing with a standard marginal Gaussian log-likelihood epression the log-determinant and quadratic form can be identified log det(q y ) = log det(q ) + log det(q ɛ ) log det(q y ) (y A µ ) Q y (y A µ ) = (µ y µ ) Q (µ y µ ) + (y A µ y ) Q ɛ (y A µ y ) even though the marginal precision matri itself Q y is dense and cannot be stored. 6
7 3.2 Posterior parameter likelihood and total marginal data likelihood Using the result for the conditional marginal data likelihood we can epress the posterior parameter likelihood as π(θ y) = π(θ y) π(y) = π(θ)π(y θ) π(y) where the only unknown is the total marginal data likelihood π(y) which can be approimated with numerical integration over θ π(y) = π(θ)π(y θ) dθ. 3.3 A note on non-gaussian data and INLA For non-gaussian data Laplace approimation can be used which in essence involves replacing the non- Gaussian conditional posterior π( θ y) with a Gaussian approimation in the epression for π(y θ) taking as the mode of π( θ y). The INLA implementation finds the mode of π(θ y) using numerical optimisation and further approimates the marginal posteriors for the latent variables k π( k y) with numerical integration of π( k y) = π( k θ y)π(θ y) dθ where a further laplace approimation of π( k θ y) is made (as well as skweness corrections not discussed here) π( k θ y) = π( k (k) θ y). π( (k) θ y k ) (k) = (k) This combination of Laplace approimations is the nested Laplace part of INLA. In the case of Gaussian data the only approimation in the method is the numerical integration itself. 4 Linear constraints There are two main classes of linear constraints: 1. Hard constraints which fall into two subclasses (a) Interacting hard constraints where linear combinations of nodes are constrained (b) Non-interacting hard constraints where each constraint acts only on a single node in effect specifying those nodes eplicitly 2. Soft constraints where linear combinations are specified with some uncertainty equivalent to posterior conditioning on observations of these linear combinations. Observations are treated under this framework when they break the Markov structure in the posterior distribution e.g. by affecting all the nodes in the field directly. All these cases can be written on matri form as A c + ɛ c = e c 7
8 where ɛ c = 0 in the hard constraint cases and ɛ c N(0 Q 1 c ) in the soft constraint case. The aim is to sample from and evaluate properties of the conditional distribution for ( e c ) when N(µ Q 1 ) (e c ) N(A c Q 1 c ) where comes from a known GMRF model typically either a prior or a posterior distribution. Let r denote the number of constraints i.e. the number of rows in A c and the number of elements in e c. 4.1 Non-interacting hard constraints Non-interacting hard constraints are easiest to handle as the constrained nodes can be completely removed from all calculations in a preprocessing step as they are in effect known deterministic values and the conditional precision for the remaining nodes can be trivially etracted as a block from the joint precision matri. Without loss of practical generality we can assume that all scaling is done in e c so that each row of A c has a single non-zero entry that is equal to 1. For ease of notation assume that the nodes are ordered so that the unconstrained nodes U are followed by the constrained C so that the joint distribution is [ ] ( [µu ] [ ] ) 1 U QUU Q N UC. C µ C Q CU Q CC The conditional distribution for the unconstrained nodes is given by ( ) ( U C = e c ) N µ U C Q 1 UU µ U C = µ U Q 1 UU Q UC(e c µ C ). The constrained mean and constrained samples are obtained with the same forward and backward solving techniques as in Section 1.3 and Section using the Cholesky decomposition of Q UU. The joint constrained distribution can be written as a degenerate Normal distribution ([ ] ) ([ ] [ ]) U µu C Q 1 C = e c N UU 0. C e c Conditioning by Kriging Conditioning on local observations preserves the Markov structure and Markov breaking conditioning is therefore done as a final step when computing posterior means and sampling from the posterior as well as from the prior. Using the conditioning by Kriging introduced in Section conditioning on linear constraints or on non-local observations can be written in a common framework. Let Q c = L c L c be the Cholesky decomposition of Q c in the soft constraint case and for notational convenience define Q 1 c = 0 and L 1 c = 0 for hard constraints. 8
9 The method proceeds by constructing a sample from the unconstrained model and then correcting for the constraints = µ + L w w N(0 I m+p ) e c = e c + L c w c w c N(0 I r ) ( ) 1 = Q 1 A c A c Q 1 A c + Q 1 c (Ac e c). In order to obtain the conditional mean instead of a sample set = µ and e c = e c. The dense but small inner r r matri can be evaluated using a single forward solve with L A c Q 1 A c ( ) + Q 1 c = L 1 A c L 1 A c + Q 1 c. Defining à c = L 1 A c the full epression becomes = L à c (Ãc à c ) 1 + Q 1 c (Ac e c) where the remaining inner r r solve is evaluated using e.g. Cholesky decomposition and the result fed into a final backward solve with L. In order to avoid unnecessary recalculations for multiple samples further precomputation can be used. A fully Cholesky based approach that also simplifies later calulations is L c L c = Ãcà c + L c L 1 c B = L à c = B L c 1 L c (A c e c) where L c is a dense r r Cholesky factor and B is dense (m + p) r. For models that require iterative solves for Q instead of Cholesky decomposition the pre-calculations are modified slightly Ãc L c L c = Q 1 A c = A c Ãc B = à c L c but the final step for constructing remains unchanged. 4.3 Conditional covariances From the conditioning by Kriging algorithm we obtain + L c L 1 c Cov( e c ) = Q 1 BB which can be combined with the neighbour covariance algorithm from Section 2.2 to yield the conditional covariances between nodes that are neighbours in the unconstrained model { S ij k (S c ) ij = B ikb jk when Q ij 0 or (i j) is in the Cholesky infill and 0 otherwise. 9
10 4.4 Constrained likelihood evaluation The constrained likelihood is given by π( e c ) = π()π(e c ) π(e c ) where π(e c ) is a degenerate density in the hard constraint case. The marginal distributions for and e c are N(µ Q 1 ) e c N(A c µ L c L c ) and the densities can be evaluated with the usual approach with log π() = m + p log(2π) + log det(l) ( µ) Q( µ) log π(e c ) = r 2 log(2π) log det( L c ) 1 2 (e c A c µ) L c L 1 c (e c A c µ). In the hard constraint case Section of (Rue and Held 2005) provides the degenerate conditional density through log π(e c ) = log π(a c ) = 1 2 log det(a ca c ) which can be evaluated using the Cholesky decomposition of the r r matri A c A c. In the soft constraint case log π(e c ) = r 2 log(2π) + log det(l c) 1 2 (e c A c ) Q c (e c A c ). References Rue H. and L. Held (2005). Gaussian Markov Random Fields; Theory and Applications. Chapman & Hall/CRC. 10
Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs
Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances
More informationGaussian Elimination and Back Substitution
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationLinear Algebraic Equations
Linear Algebraic Equations 1 Fundamentals Consider the set of linear algebraic equations n a ij x i b i represented by Ax b j with [A b ] [A b] and (1a) r(a) rank of A (1b) Then Axb has a solution iff
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More informationMultivariate Gaussian Random Fields with SPDEs
Multivariate Gaussian Random Fields with SPDEs Xiangping Hu Daniel Simpson, Finn Lindgren and Håvard Rue Department of Mathematics, University of Oslo PASI, 214 Outline The Matérn covariance function and
More informationTechnical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models
Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationOn Gaussian Process Models for High-Dimensional Geostatistical Datasets
On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May
More informationMarkov random fields. The Markov property
Markov random fields The Markov property Discrete time: (X k X k!1,x k!2,... = (X k X k!1 A time symmetric version: (X k! X!k = (X k X k!1,x k+1 A more general version: Let A be a set of indices >k, B
More informationGaussian Processes 1. Schedule
1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More information5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)
5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h
More informationR-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14
1 / 14 R-INLA Sam Clifford Su-Yun Kang Jeff Hsieh Bayesian Research and Analysis Group 30 August 2012 What is R-INLA? R package for Bayesian computation Integrated Nested Laplace Approximation MCMC free
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationBayesian spatial hierarchical modeling for temperature extremes
Bayesian spatial hierarchical modeling for temperature extremes Indriati Bisono Dr. Andrew Robinson Dr. Aloke Phatak Mathematics and Statistics Department The University of Melbourne Maths, Informatics
More informationNumerical Linear Algebra
Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost
More informationProgram Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects
Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More information1 Positive definiteness and semidefiniteness
Positive definiteness and semidefiniteness Zdeněk Dvořák May 9, 205 For integers a, b, and c, let D(a, b, c) be the diagonal matrix with + for i =,..., a, D i,i = for i = a +,..., a + b,. 0 for i = a +
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationFast algorithms that utilise the sparsity of Q exist (c-package ( Q 1. GMRFlib) Seattle February 3, 2009
GMRF:s Results References Some Theory for and Applications of Gaussian Markov Random Fields Johan Lindström 1 David Bolin 1 Finn Lindgren 1 Håvard Rue 2 1 Centre for Mathematical Sciences Lund University
More informationLU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU
LU Factorization To further improve the efficiency of solving linear systems Factorizations of matrix A : LU and QR LU Factorization Methods: Using basic Gaussian Elimination (GE) Factorization of Tridiagonal
More informationVAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:
VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationLU Factorization. A m x n matrix A admits an LU factorization if it can be written in the form of A = LU
LU Factorization A m n matri A admits an LU factorization if it can be written in the form of Where, A = LU L : is a m m lower triangular matri with s on the diagonal. The matri L is invertible and is
More informationAnomaly Detection and Removal Using Non-Stationary Gaussian Processes
Anomaly Detection and Removal Using Non-Stationary Gaussian ocesses Steven Reece Roman Garnett, Michael Osborne and Stephen Roberts Robotics Research Group Dept Engineering Science Oford University, UK
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationAMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems
AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given
More informationNORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET
NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Investigating posterior contour probabilities using INLA: A case study on recurrence of bladder tumours by Rupali Akerkar PREPRINT STATISTICS NO. 4/2012 NORWEGIAN
More informationSolution of Linear Equations
Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass
More informationElementary Linear Algebra
Matrices J MUSCAT Elementary Linear Algebra Matrices Definition Dr J Muscat 2002 A matrix is a rectangular array of numbers, arranged in rows and columns a a 2 a 3 a n a 2 a 22 a 23 a 2n A = a m a mn We
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6
CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 GENE H GOLUB Issues with Floating-point Arithmetic We conclude our discussion of floating-point arithmetic by highlighting two issues that frequently
More informationReview of Linear Algebra
Review of Linear Algebra Dr Gerhard Roth COMP 40A Winter 05 Version Linear algebra Is an important area of mathematics It is the basis of computer vision Is very widely taught, and there are many resources
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationCS412: Lecture #17. Mridul Aanjaneya. March 19, 2015
CS: Lecture #7 Mridul Aanjaneya March 9, 5 Solving linear systems of equations Consider a lower triangular matrix L: l l l L = l 3 l 3 l 33 l n l nn A procedure similar to that for upper triangular systems
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationBayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)
Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:
More informationPractical Linear Algebra: A Geometry Toolbox
Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationLU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b
AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationGaussian Markov Random Fields: Theory and Applications
Gaussian Markov Random Fields: Theory and Applications Håvard Rue 1 1 Department of Mathematical Sciences NTNU, Norway September 26, 2008 Part I Definition and basic properties Outline I Introduction Why?
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationCalculation in the special cases n = 2 and n = 3:
9. The determinant The determinant is a function (with real numbers as values) which is defined for quadratic matrices. It allows to make conclusions about the rank and appears in diverse theorems and
More informationNumerical Linear Algebra
Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one
More informationNumerical Analysis: Solving Systems of Linear Equations
Numerical Analysis: Solving Systems of Linear Equations Mirko Navara http://cmpfelkcvutcz/ navara/ Center for Machine Perception, Department of Cybernetics, FEE, CTU Karlovo náměstí, building G, office
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationMatrix decompositions
Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The
More informationFisher Information in Gaussian Graphical Models
Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationLecture 4.2 Finite Difference Approximation
Lecture 4. Finite Difference Approimation 1 Discretization As stated in Lecture 1.0, there are three steps in numerically solving the differential equations. They are: 1. Discretization of the domain by
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationGRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT
GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT Cha Zhang Joint work with Dinei Florêncio and Philip A. Chou Microsoft Research Outline Gaussian Markov Random Field Graph construction Graph transform
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationPerturbation Theory for Variational Inference
Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract
More informationEC5555 Economics Masters Refresher Course in Mathematics September 2014
EC5555 Economics Masters Refresher Course in Mathematics September 4 Lecture Matri Inversion and Linear Equations Ramakanta Patra Learning objectives. Matri inversion Matri inversion and linear equations
More informationLinear Least-Squares Data Fitting
CHAPTER 6 Linear Least-Squares Data Fitting 61 Introduction Recall that in chapter 3 we were discussing linear systems of equations, written in shorthand in the form Ax = b In chapter 3, we just considered
More informationHow to solve the stochastic partial differential equation that gives a Matérn random field using the finite element method
arxiv:1803.03765v1 [stat.co] 10 Mar 2018 How to solve the stochastic partial differential equation that gives a Matérn random field using the finite element method Haakon Bakka bakka@r-inla.org March 13,
More information(1) for all (2) for all and all
8. Linear mappings and matrices A mapping f from IR n to IR m is called linear if it fulfills the following two properties: (1) for all (2) for all and all Mappings of this sort appear frequently in the
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationLecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky
Lecture 2 INF-MAT 4350 2009: 7.1-7.6, LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Tom Lyche and Michael Floater Centre of Mathematics for Applications, Department of Informatics,
More informationCSCI-6971 Lecture Notes: Probability theory
CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,
More informationExact and Approximate Numbers:
Eact and Approimate Numbers: The numbers that arise in technical applications are better described as eact numbers because there is not the sort of uncertainty in their values that was described above.
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationMulti-resolution models for large data sets
Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation NORDSTAT, Umeå, June, 2012 Credits Steve Sain, NCAR Tia LeRud, UC Davis
More informationMobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti
Mobile Robotics 1 A Compact Course on Linear Algebra Giorgio Grisetti SA-1 Vectors Arrays of numbers They represent a point in a n dimensional space 2 Vectors: Scalar Product Scalar-Vector Product Changes
More informationSpatial point processes in the modern world an
Spatial point processes in the modern world an interdisciplinary dialogue Janine Illian University of St Andrews, UK and NTNU Trondheim, Norway Bristol, October 2015 context statistical software past to
More informationLU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark
DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationUniversity of Alabama Department of Physics and Astronomy. PH 125 / LeClair Spring A Short Math Guide. Cartesian (x, y) Polar (r, θ)
University of Alabama Department of Physics and Astronomy PH 125 / LeClair Spring 2009 A Short Math Guide 1 Definition of coordinates Relationship between 2D cartesian (, y) and polar (r, θ) coordinates.
More informationAn EM algorithm for Gaussian Markov Random Fields
An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationE(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2
1 Gaussian Processes Definition 1.1 A Gaussian process { i } over sites i is defined by its mean function and its covariance function E( i ) = µ i c ij = Cov( i, j ) plus joint normality of the finite
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More information1.Chapter Objectives
LU Factorization INDEX 1.Chapter objectives 2.Overview of LU factorization 2.1GAUSS ELIMINATION AS LU FACTORIZATION 2.2LU Factorization with Pivoting 2.3 MATLAB Function: lu 3. CHOLESKY FACTORIZATION 3.1
More informationLinear Algebra: Matrix Eigenvalue Problems
CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationFactor Analysis. Qian-Li Xue
Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More information