Gaussian Markov Random Fields: Theory and Applications

Size: px
Start display at page:

Download "Gaussian Markov Random Fields: Theory and Applications"

Transcription

1 Gaussian Markov Random Fields: Theory and Applications Håvard Rue 1 1 Department of Mathematical Sciences NTNU, Norway September 26, 2008

2 Part I Definition and basic properties

3 Outline I Introduction Why? Outline of this course What is a GMRF? The precision matrix Definition of a GMRF Example: Auto-regressive process Why are GMRFs important? Main features of GMRFs Properties of GMRFs Interpretation of elements of Q Markov properties Conditional density Specification through full conditionals Proof via Brook s lemma

4 Introduction Why? Why is it a good idea to learn about Gaussian Markov random fields (GMRFs)? That is a good question! What s there to learn? Isn t all just a Gaussian? That is also a good question!

5 Introduction Why? Why? Gaussians (and most often in the form of a GMRF) are extensively used in statistical models. However, its excellent computational properties are not often explored. Lack of knowledge of general purpose and near optimal numerical algorithms

6 Introduction Why? Why? Fast(er) MCMC based inference Recent developments (the really nice stuff...) Near instant (compare to MCMC) approximate Bayesian inference is often possible Deterministic Relative error In practice: no error

7 Introduction Why? Longitudinal mixed effects model: Epil-example from BUGS

8 Introduction Why? Longitudinal mixed effects model: Epil-example from BUGS... In this example x = (a 0, a Base, a Trt, a BT, a Age, a V4, {b1 j }, {b jk }) is a GMRF. Two (hyper-)parameters Precision(b1) and Precision(b) Poisson-observations

9 Outline of this course Outline of this course: Day I Lecture 1 GMRFs: definition and basic properties Lecture 2 Simulation algorithms for GMRFs Lecture 3 Numerical methods for sparse matrices Lecture 4 Intrinsic GMRFs Lecture 5 Hierarchical GMRF models and MCMC algorithms for such models Lecture 6 Case-studies

10 Outline of this course Outline of this course: Day II Lecture 1 Motivation for Approximate Bayesian inference Lecture 2 INLA: Integrated Nested Laplace Approximations Lecture 3 The inla-program: examples Lecture 4 Using R-interface to the inla-program (Sara Martino) Lecture 5 Case studies with R (Sara Martino) Lecture 6 Discussion

11 Outline of this course What is a GMRF? What is a Gaussian Markov random field (GMRF)? A GMRF is a simple construct A normal distributed random vector x = (x 1,..., x n ) T Additional Markov properties: x i x j x ij x i and x j are conditional independent (CI).

12 Outline of this course The precision matrix If x i x j x ij for a set of {i, j}, then we need to constrain the parametrisation of the GMRF. Covariance matrix: difficult Precision matrix: easy

13 Outline of this course The precision matrix Conditional independence and the precision matrix The density of a zero mean Gaussian ( π(x) Q 1/2 exp 1 ) 2 xt Qx Constraining the parametrisation to obey CI properties Theorem x i x j x ij Q ij = 0

14 Outline of this course The precision matrix Use a (undirected) graph G = (V, E) to represent the CI properties, V Vertices: 1, 2,..., n. E Edges {i, j} No edge between i and j if x i x j x ij. An edge between i and j if x i x j x ij.

15 Outline of this course Definition of a GMRF Definition of a GMRF Definition (GMRF) A random vector x = (x 1,..., x n ) T is called a GMRF wrt the graph G = (V = {1,..., n}, E) with mean µ and precision matrix Q > 0, iff its density has the form ( π(x) = (2π) n/2 Q 1/2 exp 1 ) 2 (x µ)t Q(x µ) and Q ij 0 {i, j} E for all i j.

16 Outline of this course Example: Auto-regressive process Simple example of a GMRF Auto-regressive process of order 1 x t x t 1,..., x 1 N (φx t 1, 1), t = 2,..., n and x 1 N (0, (1 φ 2 ) 1 ). Tridiagonal precision matrix 1 φ φ 1 + φ 2 φ Q = φ 1 + φ 2 φ φ 1

17 Outline of this course Why are GMRFs important? Usage of GMRFs (I) Structural time-series analysis Autoregressive models. Gaussian state-space models. Computational algorithms based on the Kalman filter and its variants. Analysis of longitudinal and survival data temporal GMRF priors state-space approaches spatial GMRF priors used to analyse longitudinal and survival data.

18 Outline of this course Why are GMRFs important? Usage of GMRFs (II) Graphical models A key model Estimate Q and its (associated) graph from data. Often used in a larger context. Semiparametric regression and splines Model a smooth curve in time or a surface in space. Intrinsic GMRF models and random walk models Discretely observed integrated Wiener processes are GMRFs GMRFs models for coefficients in B-splines.

19 Outline of this course Why are GMRFs important? Usage of GMRFs (III) Image analysis The first main area for spatial models Image restoration using the Wiener filter. Texture modelling and texture discrimination. Segmentation Deformable templates Object identification 3D reconstruction Restoring ultrasound images

20 Outline of this course Why are GMRFs important? Usage of GMRFs (IV) Spatial statistics Latent GMRF model analysis of spatial binary data Geostatistics using GMRFs Analysis of data in social sciences and spatial econometrics Spatial and space-time epidemiology Environmental statistics Inverse problems

21 Outline of this course Main features of GMRFs Main features Analytical tractable Modelling using conditional independence Merging GMRFs using conditioning (hierarchical models) Unified framework for understanding representation computation using numerical methods for sparse matrices Fits nicely into the MCMC world Can construct fast and reliable block-mcmc algorithms.

22 Properties of GMRFs Constraining the parametrisation to obey CI properties Theorem x i x j x ij Q ij = 0

23 Properties of GMRFs Proof Start with the factorisation theorem. Theorem x y z π(x, y, z) = f (x, z)g(y, z) (1) for some functions f and g, and for all z with π(z) > 0. Example For π(x, y, z) exp(x + xz + yz) on some bounded region, we see that x y z. However, this is not the case for π(x, y, z) exp(xyz)

24 Properties of GMRFs Proof... π(x i, x j, x ij ) exp ( 1 ) x k Q kl x l 2 ( exp k,l 1 2 x ix j (Q ij + Q ji ) 1 }{{} 2 term 1 {k,l} ={i,j} x k Q kl x l } {{ } term 2 Term 1 involves x i x j iff Q ij 0. Term 2 does not involve x i x j. We now see that π(x i, x j, x ij ) = f (x i, x ij )g(x j, x ij ) for some functions f and g, iff Q ij = 0. Use the factorisation theorem. ).

25 Properties of GMRFs Interpretation of elements of Q Interpretation of elements of Q Let x be a GMRF wrt G = (V, E) with mean µ and precision matrix Q > 0, then E(x i x i ) = µ i 1 Q ij (x j µ j ), Q ii Prec(x i x i ) = Q ii and j:j i Corr(x i, x j x ij ) = Q ij Qii Q jj, i j.

26 Properties of GMRFs Markov properties Markov properties Let x be a GMRF wrt G = (V, E). Then the following are equivalent. The pairwise Markov property: x i x j x ij if {i, j} E and i j.

27 Properties of GMRFs Markov properties Markov properties Let x be a GMRF wrt G = (V, E). Then the following are equivalent. The local Markov property: x i x {i,ne(i)} x ne(i) for every i V.

28 Properties of GMRFs Markov properties Markov properties Let x be a GMRF wrt G = (V, E). Then the following are equivalent. The global Markov property: x A x B x C for all disjoint sets A, B and C where C separates A and B, and A and B are non-empty.

29 Properties of GMRFs Conditional density (Induced) subgraph Let A V G A denote the graph restricted to A. remove all nodes not belonging to A, and all edges where at least one node does not belong to A Example A = {1, 2}, then V A = {1, 2} and E A = {{1, 2}}

30 Properties of GMRFs Conditional density Conditional density I Let V = A B where A B =, and ( ) ( ) xa µa x =, µ =, Q = x B µ B ( QAA Q AB Q BA Q BB ). Result: x A x B is then a GMRF wrt the subgraph G A with parameters µ A B and Q A B > 0, where µ A B = µ A Q 1 AA Q AB(x B µ B ) and Q A B = Q AA.

31 Properties of GMRFs Conditional density Conditional density II µ A B = µ A Q 1 AA Q AB(x B µ B ) and Q A B = Q AA. Have explicit knowledge Q A B as the principal matrix Q AA. The subgraph G A does not change the structure, only removes nodes and edges in A. The conditional mean only depends on nodes in A ne(a). If Q AA is sparse, then µ A B is the solution of a sparse linear system Q AA (µ A B µ A ) = Q AB (x B µ B )

32 Properties of GMRFs Conditional density Conditional density III

33 Properties of GMRFs Conditional density Conditional density III

34 Properties of GMRFs Conditional density Conditional density III

35 Properties of GMRFs Specification through full conditionals Specification through full conditionals An alternative to specifying a GMRF by its mean and precision matrix, is to specify it implicitly through the full conditionals {π(x i x i ), i = 1,..., n} This approach was pioneered by Besag (1974,1975) Also known as conditional autoregressions or CAR-models.

36 Properties of GMRFs Specification through full conditionals Example: AR-process Specify full conditionals E(x 1 x 1 ) = 0.3x 2 Prec(x 1 x 1 ) = 1 E(x 3 x 3 ) = 0.2x x 4 Prec(x 3 x 3 ) = 2 and so on.

37 Properties of GMRFs Specification through full conditionals Example: Spatial model Specify full conditionals for each i (assuming zero mean) E(x i x i ) = j w ij x j Prec(x i x i ) = κ i

38 Properties of GMRFs Specification through full conditionals Potential problems Even though the full conditionals are well defined, are we sure that the joint density exists and is unique? What are the compatibility conditions required for a joint GMRF to exists with the prescribed Markov properties.

39 Properties of GMRFs Specification through full conditionals In general Specify the full conditionals as normals with E(x i x i ) = µ i β ij (x j µ j ) j:j i and (2) Prec(x i x i ) = κ i > 0 (3) for i = 1,..., n, for some {β ij, i j}, and vectors µ and κ. Clearly, is defined implicitly by the nonzero terms of {β ij }.

40 Properties of GMRFs Specification through full conditionals E(x i x i ) = µ i j:j i β ij (x j µ j ) and Prec(x i x i ) = κ i > 0 Must be a joint density π(x) with these as the full conditionals. Comparing term by term with (2): Q ii = κ i, and Q ij = κ i β ij Q is symmetric, i.e., κ i β ij = κ j β ji, We have a candidate for a joint density provided Q > 0.

41 Properties of GMRFs Proof via Brook s lemma Lemma (Brook s lemma) Let π(x) be the density for x R n and define Ω = {x R n : π(x) > 0}. Let x, x Ω, then π(x) π(x ) = = n π(x i x 1,..., x i 1, x i+1,..., x n) i=1 n i=1 π(x i x 1,..., x i 1, x i+1,..., x n) π(x i x 1,..., x i 1, x i+1,..., x n ) π(x i x 1,..., x i 1, x i+1,..., x n ).

42 Properties of GMRFs Proof via Brook s lemma Assume µ = 0 and fix x = 0. Then and log π(x) π(0) = 1 2 log π(x) π(0) = 1 2 n κ i xi 2 i=1 n κ i xi 2 i=1 n i 1 κ i β ij x i x j. (4) i=2 j=1 n 1 n κ i β ij x i x j. (5) i=1 j=i+1 Since (4) and (5) must be identical then κ i β ij = κ j β ji for i j, and log π(x) = const 1 n κ i xi 2 1 κ i β ij x i x j ; 2 2 i=1 hence x is zero mean multivariate normal provided Q > 0. i j

43 Part II Simulation algorithms for GMRFs

44 Outline I Introduction Simulation algorithms for GMRFs Task Summary of the simulation algorithms Basic numerical linear algebra Cholesky factorisation and the Cholesky triangle Solving linear equations Avoid computing the inverse Unconditional sampling Simulation algorithm Evaluating the log-density Conditional sampling Canonical parameterisation Conditional distribution in the canonical parameterisation Sampling from a canonical parameterised GMRF

45 Outline II Sampling under hard linear constraints Introduction General algorithm (slow) Specific algorithm with not to many constraints (fast) Example Evaluating the log-density Sampling under soft linear constraints Introduction Specific algorithm with not to many constraints (fast) The algorithm Evaluate the log-density Example

46 Introduction Sparse precision matrix Recall that E(x i µ i x i ) = 1 Q ij (x i µ j ), Q ii j i Prec(x i x i ) = Q ii In most cases: Total number of neighbours is O(n). Only O(n) of the n 2 terms in Q will be non-zero. Use this to construct exact simulation algorithms for GMRFs, using numerical algorithms for sparse matrices.

47 Introduction Example of a typical precision matrix

48 Introduction Example of a typical precision matrix

49 Introduction Example of a typical precision matrix Each node have on the average 7 neighbours.

50 Simulation algorithms for GMRFs Task Simulation algorithms for GMRFs Can we take advantage of the sparse structure of Q? It is faster to factorise a sparse Q compared to a dense Q. The speedup depends on the pattern in Q, not only the number of non-zero terms. Our task Formulate all algorithms to use only sparse matrices. Unconditional simulation Conditional simulation Condition on a subset of variables Condition on linear constraints Condition on linear constraints with normal noise Evaluation of the log-density in all cases.

51 Simulation algorithms for GMRFs Summary of the simulation algorithms The result In most cases, the cost is O(n) for temporal GMRFs O(n 3/2 ) for spatial GMRFs O(n 2 ) for spatio-temporal GMRFs including evaluation of the log-density. Condition on k linear constraints, add O(k 3 ). These are general algorithms only depending on the graph G not the numerical values in Q. The core is numerical algorithms for sparse matrices.

52 Basic numerical linear algebra Cholesky factorisation and the Cholesky triangle Cholesky factorisation If A > 0 be a n n positive definite matrix, then there exists a unique Cholesky triangle L, such that L is a lower triangular matrix, and A = LL T Computing L costs n 3 /3 flops. This factorisation is the basis for solving systems like Ax = b or AX = B for k right hand sides, or equivalently, computing x = A 1 b or X = A 1 B

53 Basic numerical linear algebra Solving linear equations Algorithm 1 Solving Ax = b where A > 0 1: Compute the Cholesky factorisation, A = LL T 2: Solve Lv = b 3: Solve L T x = v 4: Return x Step 2 is called forward-substitution and cost O(n 2 ) flops. = The solution v is computed in a forward-loop v i = 1 i 1 (b i L ij v j ), i = 1,..., n (6) L ii j=1

54 Basic numerical linear algebra Solving linear equations Algorithm 1 Solving Ax = b where A > 0 1: Compute the Cholesky factorisation, A = LL T 2: Solve Lv = b 3: Solve L T x = v 4: Return x Step 3 is called back-substitution and costs O(n 2 ) flops. The solution x is computed in a backward-loop x i = 1 n (v i L ji x j ), i = n,..., 1 (7) L ii j=i+1

55 Basic numerical linear algebra Avoid computing the inverse To compute A 1 B where B is a n k matrix, we do this by computing the solution X of for each of the k columns of X. AX j = B j Algorithm 2 Solving AX = B where A > 0 1: Compute the Cholesky factorisation, A = LL T 2: for j = 1 to k do 3: Solve Lv = B j 4: Solve L T X j = v 5: end for 6: Return X

56 Unconditional sampling Simulation algorithm Sample x N (µ, Q 1 ) If Q = LL T and z N (0, I), then x defined by has covariance L T x = z Cov(x) = Cov(L T z) = (LL T ) 1 = Q 1 Algorithm 3 Sampling x N (µ, Q 1 ) 1: Compute the Cholesky factorisation, Q = LL T 2: Sample z N (0, I) 3: Solve L T v = z 4: Compute x = µ + v 5: Return x

57 Unconditional sampling Evaluating the log-density The log-density is log π(x) = n 2 log 2π + n i=1 log L ii 1 2 (x µ)t Q(x µ) }{{} =q If x is sampled, then q = z T z otherwise, compute this term as u = x µ v = Qu q = u T v

58 Conditional sampling Canonical parameterisation The canonical parameterisation A GMRF x wrt G having a canonical parameterisation (b, Q), has density ( π(x) exp 1 ) 2 xt Qx + b T x (8) ie, precision matrix Q and mean µ = Q 1 b. Write this as x N C (b, Q). (9) The relation to the Gaussian distribution, is that N (µ, Q 1 ) = N C (Qµ, Q)

59 Conditional sampling Conditional distribution in the canonical parameterisation Conditional simulation of a GMRF Decompose x as ( xa x B ) ( (µa ) ( ) ) 1 QAA Q N, AB µ B Q BA Q BB Then x A x B has canonical parameterisation x A µ A x B N C ( Q AB (x B µ B ), Q AA ) (10) Simulate using an algorithm for a canonical parameterisation.

60 Conditional sampling Sampling from a canonical parameterised GMRF Sample x N C (b, Q 1 ) Recall that N C (b, Q) = N (Q 1 b, Q 1 ) so we need to compute the mean as well. Algorithm 4 Sampling x N C (b, Q) 1: Compute the Cholesky factorisation, Q = LL T 2: Solve Lw = b 3: Solve L T µ = w 4: Sample z N (0, I) 5: Solve L T v = z 6: Compute x = µ + v 7: Return x

61 Sampling under hard linear constraints Introduction Sampling from x Ax = e A is a k n matrix, 0 < k < n, with rank k, e is a vector of length k. This case occurs quite frequently: A sum-to-zero constraint corresponds to k = 1, A = 1 T and e = 0. We will denote this problem sampling under a hard constraint. Note that π(x Ax) is singular with rank n k.

62 Sampling under hard linear constraints Introduction The linear constraint makes the conditional distribution Gaussian but it is singular as the rank of the constrained covariance matrix is n k more care must be exercised when sampling from this distribution. We have that E(x Ax = e) = µ AQ 1 (AQ 1 A T ) 1 (Aµ e) (11) Cov(x Ax = e) = Q 1 Q 1 A T (AQ 1 A T ) 1 AQ 1 (12) This is typically a dense-matrix case, which must be solved using general O(n 3 ) algorithms (see next frame for details).

63 Sampling under hard linear constraints General algorithm (slow) We can sample from this distribution as follows. As the covariance matrix is singular we compute the eigenvalues and eigenvectors, and write the covariance matrix as VΛV T where V have the eigenvectors on each row and Λ is a diagonal matrix with the eigenvalues on the diagonal. This corresponds to a different factorisation than the Cholesky triangle, but any matrix C which satisfy CC T = Σ, will do. Note that k of the eigenvalues are zero. We can produce a sample by computing v = Cz, where C = VΛ 1/2, z N (0, I), and then add the conditional mean. We can compute the log-density as log(π(x Ax = e)) = n k 2 log 2π 1 2 i:λ ii >0 log Λ ii 1 2 (x µ ) T Σ (x µ ) (13) where µ is (11), and Σ = VΛ V T where (Λ ) ii is Λ 1 ii if Λ ii > 0 and zero otherwise. In total, this is a quite computational demeaning procedure, as the algorithm is not able to take advantage of the sparse structure of Q.

64 Sampling under hard linear constraints Specific algorithm with not to many constraints (fast) Conditioning via Kriging There is an alternative algorithm which correct for the constraints, at nearly no costs if k n. Let x N (µ, Q), then compute x = x Q 1 A T (AQ 1 A T ) 1 (Ax e). (14) Now x has the correct conditional distribution! AQ 1 A T is a k k matrix, hence its factorisation is fast to compute for small k.

65 Sampling under hard linear constraints Specific algorithm with not to many constraints (fast) Algorithm 5 Sampling x Ax = e when x N (µ, Q 1 ) 1: Compute the Cholesky factorisation, Q = LL T 2: Sample z N (0, I) 3: Solve L T v = z 4: Compute x = µ + v 5: Compute V n k = Q 1 A T with Algorithm 2 6: Compute W k k = AV 7: Compute U k n = W 1 V T using Algorithm 2 8: Compute c = Ax e 9: Compute x = x U T c 10: Return x If z = 0 in Algorithm 5, then x is the conditional mean. Extra cost is only O(k 3 ) for large k!

66 Sampling under hard linear constraints Example Example Let x 1,..., x n be independent Gaussian variables with variance σi 2 mean µ i. To sample x conditioned on and xi = 0 we sample first x i N (µ i, σ 2 i ), and compute the constrained sample x, as i = 1,..., n x i = x i c σ 2 i (15) where c = j x j j σ2 j

67 Sampling under hard linear constraints Evaluating the log-density The log-density can be rapidly computed using the following identity, π(x)π(ax x) π(x Ax) =, (16) π(ax) We can compute each term on the rhs easier than the lhs. π(x)-term. This is a GMRF and the log-density is easy to compute using L computed in Algorithm 5 step 1.

68 Sampling under hard linear constraints Evaluating the log-density The log-density can be rapidly computed using the following identity, π(x)π(ax x) π(x Ax) =, (16) π(ax) We can compute each term on the rhs easier than the lhs. π(ax x)-term. This is a degenerate density, which is either zero or a constant, log π(ax x) = 1 2 log AAT (17) The determinant of a k k matrix can be found its Cholesky factorisation.

69 Sampling under hard linear constraints Evaluating the log-density The log-density can be rapidly computed using the following identity, π(x)π(ax x) π(x Ax) =, (17) π(ax) We can compute each term on the rhs easier than the lhs. π(ax)-term. Ax is Gaussian with mean Aµ and covariance matrix AQ 1 A T with Cholesky triangle L available from Algorithm 5 step 7.

70 Sampling under soft linear constraints Introduction Sampling from x Ax = ɛ Let x be a GMRF which is observed by e, where e x N (Ax, Σ ɛ ). e is a vector of length k < n A a k n matrix rank k Σ ɛ > 0 is the covariance matrix for the noise. The conditional for x, is log π(x e). = 1 2 (x µ)t Q(x µ) 1 2 (e Ax)T Σ 1 ɛ (e Ax) (18) x e N C (Qµ + A T Σ 1 ɛ e, Q + A T Σ 1 ɛ A) (19) This precision matrix is most often a full matrix and the nice sparse structure of Q is lost.

71 Sampling under soft linear constraints Introduction Example Observe the sum of x i with unit variance noise, the posterior precision is Q + 11 T which is a dense matrix. In general, we have to sample from (18) using a general algorithm which is computational expensive for large n.

72 Sampling under soft linear constraints Specific algorithm with not to many constraints (fast) Alternative approach Similar problem to sampling under a hard constraint Ax = e, but now e is stochastic Ax = ɛ where ɛ N (e, Σ ɛ ). (20) We denote this case sampling under a soft constraint. If we extend (14) to x = x Q 1 A T (AQ 1 A T + Σ ɛ ) 1 (Ax ɛ), (21) where ɛ and x are samples from their respective distributions,...then x has the correct conditional distribution.

73 Sampling under soft linear constraints The algorithm Algorithm 6 Sampling x Ax = ɛ when ɛ N (e, Σ ɛ) and x N (µ, Q 1 ) 1: Compute the Cholesky factorisation, Q = LL T 2: Sample z N (0, I) 3: Solve L T v = z 4: Compute x = µ + v 5: Compute V n k = Q 1 A T using Algorithm 2 using L 6: Compute W k k = AV + Σ ɛ 7: Compute U k n = W 1 V T using Algorithm 2 8: Sample ɛ N (e, W) using the factorisation from step 7 9: Compute c = Ax ɛ 10: Compute x = x U T c 11: Return x When z = 0 and ɛ = e then x is the conditional mean.

74 Sampling under soft linear constraints Evaluate the log-density The log-density is computed via π(x e) = π(x)π(e x) π(e) (22) All Cholesky triangles required are available from the simulation algorithm. π(x)-term. This is a GMRF. π(e x)-term. e x is Gaussian with mean Ax and covariance Σ ɛ. π(e)-term. e is Gaussian with mean Aµ and covariance matrix AQ 1 A T + Σ ɛ.

75 Sampling under soft linear constraints Example Example Let x 1,..., x n be independent Gaussian variables with variance σi 2 mean µ i. We now observe and e N ( x i, σ 2 ɛ) To sample from π(x e) we sample first x i N (µ i, σ 2 i ), i = 1,..., n and then ɛ N (e, σ 2 ɛ) A conditional sample x, is then xi = x i c σi 2, where j c = j ɛ j σ2 j + σɛ 2

76 Part III Numerical methods for sparse matrices

77 Outline I Introduction Cholesky factorisation The Cholesky triangle Interpretation The zero-pattern in L Example: A simple graph Example: auto-regressive processes Band matrices Bandwidth is preserved Cholesky factorisation for band-matrices Reordering schemes Introduction Reordering to band-matrices Reordering using the idea of nested dissection What to do with sparse matrices?

78 Outline II A numerical case-study Introduction GMRF-models in time GMRF-models in space GMRF-models in time space

79 Introduction Numerical methods for sparse matrices Have shown that computations on GMRFs can be expressed such that the main tasks are 1. compute the Cholesky factorisation of Q = LL T, and 2. solve Lv = b and L T x = z. The second task is much-faster than the first, but sparsity will be of advantage also here.

80 Introduction The goal is to explain why a sparse Q allow for fast factorisation how we can take advantage of it, why we gain if we permute the vertics before factorising the matrix, how statisticians can benefit for recent research in this area by the numerical mathematicians. At the end we present a small case study factorising some typical matrices for GMRFs, using a classical and more recent methods for factorising matrices.

81 Cholesky factorisation How to compute the Cholesky factorisation j Q ij = L ik L jk, i j. k=1 j 1 v i = Q ij L ik L jk, i j, k=1 Then L 2 jj = v j, and L ij L jj = v i for i > j. If we know {v i } for fixed j, then L jj = v j and L ij = v i / v j, for i = j + 1,..., n. This gives the jth column in L.

82 Cholesky factorisation Cholesky factorization of Q > 0 Algorithm 7 Computing the Cholesky triangle L of Q 1: for j = 1 to n do 2: v j:n = Q j:n,j 3: for k = 1 to j 1 do v j:n = v j:n L j:n,k L jk 4: L j:n,j = v j:n / v j 5: end for 6: Return L The overall process involves n 3 /3 flops.

83 The Cholesky triangle Interpretation Interpretation of L (I) Let Q = LL T, then the solution of L T x = z where z N (0, I) = is N (0, Q 1 ) distributed. Since L is lower triangular then x n = 1 z n L nn 1 x n 1 = (z n 1 L n,n 1 x n ) L n 1,n 1...

84 The Cholesky triangle Interpretation Interpretation of L (II) Theorem Let x be a GMRF wrt to the labelled graph G, with mean µ and precision matrix Q > 0. Let L be the Cholesky triangle of Q. Then for i V, E(x i x (i+1):n ) = µ i 1 L ii Prec(x i x (i+1):n ) = L 2 ii. n j=i+1 L ji (x j µ j ) and

85 The Cholesky triangle The zero-pattern in L Determine the zero-pattern in L (I) Theorem Let x be a GMRF wrt G, with mean µ and precision matrix Q > 0. Let L be the Cholesky triangle of Q and define for 1 i < j n the set F (i, j) = {i + 1,..., j 1, j + 1,..., n}, which is the future of i except j. Then x i x j x F (i,j) L ji = 0. If we can verify that L ji is zero, we do not have to compute it when factorising Q

86 The Cholesky triangle The zero-pattern in L Proof Assume µ = 0 and fix 1 i < j n. Theorem 11 gives that π(x i:n ) exp 1 n L 2 kk x k k=i = exp ( 12 ) xti:nq (i:n) x i:n, n L kk j=k+1 2 L jk x j where Q (i:n) ij = L ii L ji. Then x i x j x F (i,j) L ii L ji = 0, which is equivalent to L ji = 0 since L ii > 0 as Q (i:n) > 0.

87 The Cholesky triangle The zero-pattern in L Determine the zero-pattern in L (II) The global Markov property provide a simple and sufficient criteria for checking if L ji = 0. Corollary If F (i, j) separates i < j in G, then L ji = 0. Corollary If i j then F (i, j) does not separates i < j. The idea is simple Use the global Markov property to check if L ji = 0. Compute only the non-zero terms in L, so that Q = LL T.

88 The Cholesky triangle Example: A simple graph Example Q = L =??

89 The Cholesky triangle Example: A simple graph Example Q = L =

90 The Cholesky triangle Example: auto-regressive processes Example: AR(1)-process x t x 1:(t 1) N (φx t 1, σ 2 ), t = 1,..., n Q = L =

91 Band matrices Bandwidth is preserved Bandwidth is preserved Similarly, for an AR(p)-process Q have bandwidth p. L have lower-bandwidth p. Theorem Let Q > 0 be a band matrix with bandwidth p and dimension n, then the Cholesky triangle of Q has (lower) bandwidth p. =...easy to modify existing Cholesky-factorisation code to use only entries where i j p.

92 Band matrices Cholesky factorisation for band-matrices Avoid computing L ij and reading Q ij for i j > p. Algorithm 8 Band-Cholesky factorization of Q with bandwidth p 1: for j = 1 to n do 2: λ = min{j + p, n} 3: v j:λ = Q j:λ,j 4: for k = max{1, j p} to j 1 do 5: i = min{k + p, n} 6: v j:i = v j:i L j:i,k L jk 7: end for 8: L j:λ,j = v j:λ / v j 9: end for 10: Return L Cost is now n(p 2 + 3p) flops assuming n p.

93 Reordering schemes Introduction Reorder the vertices We can permute the vertexes; select one of the n! possible permutations, define the corresponding permutation matrix P, such that i P = Pi, where i = (1,..., n) T, is the new ordering of the vertexes. Chose P, if possible, such that is a band-matrix with a small bandwidth. Q P = PQP T (23)

94 Reordering schemes Introduction Impossible in general to obtain the optimal permutation, n! is to large! A sub-optimal ordering will do as well. Solve Qµ = b as follows: b P = Pb. Solve Q P µ P = b P Map the solution back, µ = P T µ P.

95 Reordering schemes Reordering to band-matrices Reordering to band-matrices

96 Reordering schemes Reordering to band-matrices Reordering to band-matrices

97 Reordering schemes Reordering to band-matrices Reordering to band-matrices factorisation of Q P required about seconds Solving Qµ = b required about seconds. on a 1 200MHz laptop.

98 Reordering schemes Reordering using the idea of nested dissection More optimal reordering schemes

99 Reordering schemes Reordering using the idea of nested dissection Nested dissection reordering (I) The idea generalise as follows. Select a (small) set of nodes whose removal divides the graph into two disconnected subgraphs of almost equal size. Order the nodes chosen after ordering all the nodes in both subgraphs. Apply this procedure recursively to the nodes in each subgraph. Costs in the spatial case Factorisation O(n 3/2 ) Fill-in O(n log n) Optimal in the order sense.

100 Reordering schemes Reordering using the idea of nested dissection Nested dissection reordering (II)

101 What to do with sparse matrices? Statisticians and numerical methods for sparse matrices Gupta (2002) summarises his findings on recent advances for sparse linear solvers:... recent sparse solvers have significantly improved the state of the art of the direct solution of general sparse systems.... recent years have seen some remarkable advances in the general sparse direct-solver algorithms and software. Good news We just use their software

102 A numerical case-study Introduction A numerical case-study of typical GMRFs Divide GMRF models into three categories. 1. GMRF models in time or on a line; auto-regressive models and models for smooth functions. Neighbours to x t are then those {x s } such that s t p. 2. Spatial GMRF models; regular lattice, or irregular lattice induced by a tessellation or regions of land. Neighbours to x i (spatial index i), are those j spatially close to i, where close is defined from its context. 3. Spatio-temporal GMRF models. Often an extension of spatial models to also include dynamic changes.

103 A numerical case-study Introduction Also include some global nodes: Let x be a GMRF with a common mean µ Assume µ N (0, σ 2 ) x µ N (µ1, Q 1 ) (24)...then (x, µ) is also a GMRF where the node µ is neighbour with all x i s.

104 A numerical case-study Introduction Two different algorithms 1. The band-cholesky factorisation (BCF) as in Algorithm 8. Here we use the LAPACK-routines DPBTRF and DTBSV, for the factorisation and the forward/back-substitution, respectively, and the Gibbs-Poole-Stockmeyer algorithm for bandwidth reduction. 2. The Multifrontal Supernodal Cholesky factorisation (MSCF) implementation in the library TAUCS using the nested dissection reordering from the library METIS. Both solvers are available in the GMRFLib-library

105 A numerical case-study Introduction The tasks we want to investigate are 1. factorising Q into LL T, and 2. solving LL T µ = b. Producing a random sample from the GMRF, is half the cost of solving the linear system in step 2. All tests reported here, we conducted on a 1 200MHz laptop with 512Mb memory running Linux. New machines are nearly 3 10 times as fast...

106 A numerical case-study GMRF-models in time GMRF models in time Let Q be a band-matrix with bandwidth p and dimension n. For such a problem, using BCF will be (theoretically) optimal, as the fillin will be be zero. n = 10 3 n = 10 4 n = 10 5 CPU-time p = 5 p = 25 p = 5 p = 25 p = 5 p = 25 Factorise Solve Long and thin are fast! The MSCF is less optimal for band-matrices: fillin and more complicated data-structures.

107 A numerical case-study GMRF-models in time Add 10 global nodes: n = 10 3 n = 10 4 n = 10 5 CPU-time p = 5 p = 25 p = 5 p = 25 p = 5 p = 25 Factorise Solve The fillin pn. The nested dissection ordering give good results in all cases considered so far.

108 A numerical case-study GMRF-models in space Spatial GMRF models Regular lattice

109 A numerical case-study GMRF-models in space Spatial GMRF models Irregular lattice

110 A numerical case-study GMRF-models in space n = n = n = CPU-time Method Factorise BCF MSCF Solve BCF MSCF For the largest lattice the MSCF really outperform the BCF. The reason is the O(n 3/2 ) cost for MSCF compared to O(n 2 ) for the BCF.

111 A numerical case-study GMRF-models in time space Spatio-temporal GMRF models Spatio-temporal GMRF models is often an extension of spatial GMRF models to account to time variation. time Use this model and the graph of Germany, for T = 10 and T = 100.

112 A numerical case-study GMRF-models in time space 10 global nodes CPU-time T = 10 T = 100 T = 10 T = 100 Factorise Solve The results shows a quite heavy dependency on T. Spatio-Temporal GMRFs are more demanding than spatial ones, O(n 2 ) for a n 1/3 n 1/3 n 1/3 -cube.

113 Part IV Intrinsic GMRFs

114 Outline I Introduction Definition of IGMRFs Improper GMRFs IGMRFs of first order Forward differences On the line with regular locations On the line with irregular locations Why IGMRFs are useful in applications IGMRFs of first order on irregular lattices IGMRFs of first order on regular lattices IGMRFs of second order RW2 model for regular locations The CRWk model for regular and irregular locations IGMRFs of general order on regular lattices IGMRFs of second order on regular lattices

115 Introduction Intrinsic GMRFs (IGMRF) IGMRFs are improper, i.e., they have precision matrices not of full rank. Often used as prior distributions in various applications. Of particular importance are IGMRFs that are invariant to any trend that is a polynomial of the locations of the nodes up to a specific order.

116 Definition of IGMRFs Improper GMRFs Improper GMRF Definition Let Q be an n n SPSD matrix with rank n k > 0. Then x = (x 1,..., x n ) T is an improper GMRF of rank n k with parameters (µ, Q), if its density is ( π(x) = (2π) (n k) 2 ( Q ) 1/2 exp 1 ) 2 (x µ)t Q(x µ). (25) Further, x is an improper GMRF wrt to the labelled graph G = (V, E), where Q ij 0 {i, j} E for all i j. denote the generalised determinant: product of the non-zero eigenvalues.

117 Definition of IGMRFs Improper GMRFs Markov properties The Markov properties are interpreted as those obtained from the limit of a proper density. Let the columns of A T span the null space of Q Q(γ) = Q + γa T A. (26) Q(γ) tends to the corresponding one in Q as γ 0. is interpreted as γ 0. E(x i x i ) = µ i 1 Q ij (x j µ j ) Q ii j i

118 IGMRFs of first order IGMRF of first order An intrinsic GMRF of first order is an improper GMRF of rank n 1 where Q1 = 0. The condition Q1 = 0 means that Q ij = 0, i = 1,..., n j

119 IGMRFs of first order Let µ = 0 so E(x i x i ) = 1 Q ij x j (27) Q ii j:j i j:j i Q ij /Q ii = 1 The conditional mean of x i is a weighted mean of its neighbours. No shrinking towards an overall level. Many IGMRFs are constructed such that the deviation from the overall level is a smooth curve in time or a smooth surface in space.

120 IGMRFs of first order Forward differences Definition (Forward difference) Define the first-order forward difference of a function f ( ) as f (z) = f (z + 1) f (z). Higher-order forward differences are defined recursively: k f (z) = k 1 f (z) so 2 f (z) = f (z + 2) 2f (z + 1) + f (z) (28) and in general for k = 1, 2,..., k f (z) = ( 1) k k ( k ( 1) j j j=0 ) f (z + j).

121 IGMRFs of first order Forward differences For a vector z = (z 1, z 2,..., z n ) T, z has elements z i = z i+1 z i, i = 1,..., n 1 The forward difference of kth order as an approximation to the kth derivative of f (z), f (z) = lim h 0 f (z + h) f (z) h

122 IGMRFs of first order On the line with regular locations IGMRFs of first order on the line Location of the node i is i (think time ). Assume independent increments x i iid N (0, κ 1 ), i = 1,..., n 1, so (29) x j x i N (0, (j i)κ 1 ) for i < j. (30) If the intersection between {i,..., j} and {k,..., l} is empty for i < j and k < l, then Cov(x j x i, x l x k ) = 0. (31) Properties coincide with those of a Wiener process.

123 IGMRFs of first order On the line with regular locations The density for x is ( ) π(x κ) κ (n 1)/2 exp κ n 1 ( x i ) 2 (32) 2 i=1 ( ) = κ (n 1)/2 exp κ n 1 (x i+1 x i ) 2, (33) 2 or i=1 ( π(x) κ (n 1)/2 exp κ ) 2 xt Rx (34)

124 IGMRFs of first order On the line with regular locations The structure matrix R = (35) The n 1 independent increments ensure that the rank of Q is n 1. Denote this model by RW1(κ) or short RW1.

125 IGMRFs of first order On the line with regular locations Full conditionals x i x i, κ N ( 1 2 (x i 1 + x i+1 ), 1/(2κ)), 1 < i < n, (36) Alternative interpretation: fit a first order polynomial p(j) = β 0 + β 1 j, (37) locally through the points (i 1, x i 1 ) and (i + 1, x i+1 ). The conditional mean is p(i).

126 IGMRFs of first order On the line with regular locations Example Samples with n = 99 and κ = 1 by conditioning on the constraint xi = 0.

127 IGMRFs of first order On the line with regular locations Example Marginal variances Var(x i ) for i = 1,..., n.

128 IGMRFs of first order On the line with regular locations Example Corr(x n/2, x i ) for i = 1,..., n

129 IGMRFs of first order On the line with irregular locations The first order RW for irregular locations Location of x i is s i but s i+1 s i is not constant. Assume s 1 < s 2 < < s n and let δ i def = s i+1 s i. (38)

130 IGMRFs of first order On the line with irregular locations Wiener process Consider x i as the realisation of an integrated Brownian motion in continuous time, i.e. a Wiener process W (t), at time s i. Definition (Wiener process) A Wiener process with precision κ is a continuous-time stochastic process W (t) for t 0 with W (0) = 0 and such that the increments W (t) W (s) are Gaussian with mean 0 and variance (t s)/κ for any 0 s < t. Furthermore, increments for non overlapping time intervals are independent. For κ = 1, this process is called a standard Wiener process.

131 IGMRFs of first order On the line with irregular locations Brownian bridge The full conditional is in this case known as the the Brownian bridge: and E(x i x i, κ) = κ is a precision parameter. δ i x i 1 + δ i 1 x i+1 (39) δ i 1 + δ i δ i 1 + δ i ( 1 Prec(x i x i, κ) = κ + 1 ). (40) δ i 1 δ i

132 IGMRFs of first order On the line with irregular locations The precision matrix is now 1 δ i δ i j = i Q ij = κ 1 δ i j = i otherwise (41) for 1 < i < n. A proper correction at the boundary gives the remaining diagonal terms Q 11 = κ/δ 1, Q nn = κ/δ n 1.

133 IGMRFs of first order On the line with irregular locations The joint density is ( ) π(x κ) κ (n 1)/2 exp κ n 1 (x i+1 x i ) 2 /δ i, (42) 2 i=1 and is invariant to the addition of a constant.

134 IGMRFs of first order On the line with irregular locations The interpretation of a RW1 model as a discretely observed Wiener-process, justifies the corrections needed. The underlying model is the same, it is only observed differently

135 IGMRFs of first order Why IGMRFs are useful in applications When the mean is only locally constant Alternative IGMRF ( π(x) κ (n 1)/2 exp κ 2 where x is the empirical mean of x. ) n (x i x) 2 i=1 (43)

136 IGMRFs of first order Why IGMRFs are useful in applications Assume n is even and x i = Thus x is locally constant with two levels. { 0, 1 i n/2 1, n/2 < i n. (44) Evaluate the density at this configuration under the RW1 model and the alternative (43), we obtain ( κ (n 1)/2 exp κ ) ( and κ (n 1)/2 exp n κ ) (45) 2 8 The log ratio of the densities is then of order O(n).

137 IGMRFs of first order Why IGMRFs are useful in applications The RW1 model only penalises the local deviation from a constant level. The alternative penalises the global deviation from a constant level. This local behaviour is advantageous in applications if the mean level of x is approximately or locally constant. A similar argument also apply for polynomial IGMRFs of higher order constructed using forward differences of order k as independent Gaussian increments.

138 IGMRFs of first order IGMRFs of first order on irregular lattices First order IGMRFs on irregular lattices Consider the map of the 544 regions in Germany. Two regions are neighbours if they share a common border.

139 IGMRFs of first order IGMRFs of first order on irregular lattices Between neighbouring regions i and j, say, we define a independent Gaussian increment x i x j N (0, κ 1 ) (46) Assume independent increments yields π(x) κ (n 1)/2 exp κ (x i x j ) 2. (47) 2 i j denotes the set of all unordered pairs of neighbours. Number of increments i j is larger than n, but the rank of the corresponding precision matrix is still n 1. i j

140 IGMRFs of first order IGMRFs of first order on irregular lattices There are hidden constraints in the increments due to the more complicated geometry on a lattice than on the line. Example Let n = 3 where all nodes are neighbours. Then x 1 x 2 = ɛ 1, x 2 x 3 = ɛ 2, and x 3 x 1 = ɛ 3, where ɛ 1, ɛ 2 and ɛ 3 are the increments. This implies that ɛ 1 + ɛ 2 + ɛ 3 = 0 which is the hidden linear constraint.

141 IGMRFs of first order IGMRFs of first order on irregular lattices Let n i denote the number of neighbours of region i. The precision matrix Q is n i i = j, Q ij = κ 1 i j, 0 otherwise, (48) x i x i, κ N ( 1 x j, n i j:j i 1 ). (49) n i κ

142 IGMRFs of first order IGMRFs of first order on irregular lattices Example

143 IGMRFs of first order IGMRFs of first order on irregular lattices In general there is no longer an underlying continuous stochastic process that we can relate to this density. If we change the spatial resolution or split a region into two new ones, we change the model.

144 IGMRFs of first order IGMRFs of first order on irregular lattices Weighted variants Incorporate symmetric weights w ij for each pair of adjacent nodes i and j. For example, w ij = 1/d(i, j) Assuming independent increments x i x j N (0, 1/(w ij κ)), (50) then π(x) κ (n 1)/2 exp κ w ij (x i x j ) 2. (51) 2 i j

145 IGMRFs of first order IGMRFs of first order on irregular lattices The precision matrix is k:k i w ik i = j, Q ij = κ 1/w ij i j, 0 otherwise, (52) and with mean and precision j:j i x jw ij j:j i w ij and κ j:j i w ij, (53) respectively.

146 IGMRFs of first order IGMRFs of first order on regular lattices First order IGMRFs on regular lattices For a lattice I N with n = n 1 n 2 nodes, let i = (i 1, i 2 ) denote the node in the i 1 th row and i 2 th column. Use the nearest four sites of i as its neighbours (i 1 + 1, i 2 ), (i 1 1, i 2 ), (i 1, i 2 + 1), (i 1, i 2 1). The precision matrix is n i i = j, Q ij = κ 1 i j, 0 otherwise, and the full conditionals for x i are x i x i, κ N ( 1 x j, n i j:j i (54) 1 ). (55) n i κ

147 IGMRFs of first order IGMRFs of first order on regular lattices Limiting behaviour This model have an important property The process converge to the de Wijs-process: a Gaussian process with variogram This is important there is a limiting process log(distance) The de Wijs process has dense precision matrix whereas the IGMRF is very sparse!

148 IGMRFs of second order RW2 model for regular locations The RW2 model for regular locations Let s i = i for i = 1,..., n, with a constant distance between consecutive nodes. Use the second order increments 2 x i iid N (0, κ 1 ) (56) for i = 1,..., n 2, to define the joint density of x ( ) π(x) κ (n 2)/2 exp κ n 2 (x i 2x i+1 + x i+2 ) 2 2 i=1 ( = κ (n 2)/2 exp κ ) 2 xt Rx (57) (58)

149 IGMRFs of second order RW2 model for regular locations R = (59) Verify directly that QS 1 = 0 and that the rank of Q is n 2. IGMRF of second order: invariant to the adding line to x. Known as the second order random walk model, denoted by RW2(κ) or simply RW2 model.

150 IGMRFs of second order RW2 model for regular locations Remarks This is the RW2 model defined and used in the literature. We cannot extend it consistently to the case where the locations are irregular. Similar problems occur if we increase the resolution from n to 2n locations, say. This is in contrast to the RW1 model. There exists an alternative (somewhat more involved) formulation with the desired continuous time interpretation.

151 IGMRFs of second order RW2 model for regular locations The conditional mean and precision is E(x i x i, κ) = 4 6 (x i+1 + x i 1 ) 1 6 (x i+2 + x i 2 ), (60) Prec(x i x i, κ) = 6κ, (61) respectively for 2 < i < n 2.

152 IGMRFs of second order RW2 model for regular locations Consider the second order polynomial p(j) = β 0 + β 1 j β 2j 2 (62) and compute the coefficients by a local least squares fit to the points (i 2, x i 2 ), (i 1, x i 1 ), (i + 1, x i+1 ), (i + 2, x i+2 ). (63) Turns out that E(x i x i, κ) = p(i)

153 IGMRFs of second order RW2 model for regular locations Samples with n = 99 and κ = 1 by conditioning on the constraints.

154 IGMRFs of second order RW2 model for regular locations Marginal variances Var(x i ) for i = 1,..., n.

155 IGMRFs of second order RW2 model for regular locations Corr(x n/2, x i ) for i = 1,..., n

156 The CRWk model for regular and irregular locations Random walk models: irregular locations Locations s 1 < s 2 <... < s n. Discretely observed (integrated) Wiener process Augment with the velocities Valid for any order Connection to spline-models

157 The CRWk model for regular and irregular locations Random walk models: irregular locations Locations s 1 < s 2 <... < s n. Discretely observed (integrated) Wiener process Augment with the velocities Valid for any order Connection to spline-models A 1 B 1 B T 1 A 2 + C 1 B 2 B T 2 A 3 + C 2 B Bn 1 B T n 1 C n 1

158 The CRWk model for regular and irregular locations Random walk models: irregular locations Locations s 1 < s 2 <... < s n. Discretely observed (integrated) Wiener process Augment with the velocities Valid for any order Connection to spline-models ( ) 12/δ 3 A i = i 6/δi 2 6/δi 2 4/δ i where ( ) 12/δ 3 B i = i 6/δi 2 6/δi 2 2/δ i δ i = s i+1 s i. ( 12/δ 3 C i = i 6/δi 2 6/δi 2 4/δ i )

159 IGMRFs of general order on regular lattices IGMRFs of higher order on regular lattices The precision matrix is orthogonal to a polynomial design matrix of a certain degree. Definition (IGMRFs of order k in dimension d) An IGMRF of order k in dimension d, is an improper GMRF of rank n m k 1,d where QS k 1,d = 0.

160 IGMRFs of second order on regular lattices A second order IGMRF in two dimensions Consider a regular lattice I N in d = 2 dimensions where s i1 = i 1 and s i2 = i 2. Choose the independent increments ( ) x(i1 +1,i 2 ) + x (i1 1,i 2 ) + x (i1,i 2 +1) + x (i1,i 2 1) 4xi1,i 2 (64) The motivation for this choice is that (64) is ( 2 i1 + 2 i 2 ) xi1 1,i 2 1 (65) Invariant to adding a first order polynomial p 1,2 (i 1, i 2 ) = β 00 + β 10 i 1 + β 01 i 2,

161 IGMRFs of second order on regular lattices The precision matrix (apart from boundary effects) should have non-zero elements ( 2 i i 2 ) 2 = ( 4 i i 1 2 i i 2 ) (66) which is a negative difference approximation to the biharmonic differential operator ( ) 2 2 x y 2 = 4 x x 2 y y 4. (67) The fundamental solution of the biharmonic equation ( 4 x ) x 2 y y 4 φ(x, y) = 0 (68) is the thin plate spline.

162 IGMRFs of second order on regular lattices Full conditionals in the interior ( E(x i x i ) = ) (69) and Prec(x i x i ) = 20κ. (70)

163 IGMRFs of second order on regular lattices Alternative IGMRFs in two dimensions Using only the terms to obtain an difference approximation to (71) 2 x y 2 (72) is not optimal. The discretization error different 45 degrees to the main directions, hence we should expect a directional effect.

164 IGMRFs of second order on regular lattices Use an isotropic approximations (ex. Mehrstellen-stencil ) (73) ( E(x i x i ) = (74) Prec(x i x i ) = 13κ. (75) )

165 IGMRFs of second order on regular lattices Samples.

166 IGMRFs of second order on regular lattices Samples.

167 Part V Hierarchical GMRF models

168 Outline I Hierarchical GMRFs models A simple example Classes of hierarchical GMRF models A brief introduction to MCMC Blocking Rate of convergence Example One-block algorithm Block algorithms for hierarchical GMRF models General setup The one-block algorithm The sub-block algorithm GMRF approximations Merging GMRFs Introduction

169 Outline II Why is it so?

170 Hierarchical GMRFs models Hierarchical GMRFs models Characterised through several stages of observables and parameters. A typical scenario is as follows. Stage 1 Formulate a distributional assumption for the observables, dependent on latent parameters. Time series of binary observations y, we may assume y i, i = 1,..., n : y i B(p i ) We assume the observations to be conditionally independent

171 Hierarchical GMRFs models Hierarchical GMRFs models Characterised through several stages of observables and parameters. A typical scenario is as follows. Stage 2 Assign a prior model, i.e. a GMRF, for the unknown parameters, here p i. Chose an autoregressive model for the logit-transformed probabilities x i = logit(p i ).

172 Hierarchical GMRFs models Hierarchical GMRFs models Characterised through several stages of observables and parameters. A typical scenario is as follows. Stage 3 Assign to unknown parameters (or hyperparameters) of the GMRF precision parameter κ strength of dependency. Further stages if needed.

173 Hierarchical GMRFs models A simple example Tokyo rainfall data Stage 1 Binomial data y i { Binomial(2, p(x i )) Binomial(1, p(x i ))

174 Hierarchical GMRFs models A simple example Tokyo rainfall data Stage 2 Assume a smooth latent x, x RW 2(κ), logit(p i ) = x i

175 Hierarchical GMRFs models A simple example Tokyo rainfall data Stage 3 Gamma(α, β)-prior on κ

176 Hierarchical GMRFs models Classes of hierarchical GMRF models Classes of hierarchical GMRF models Normal data Block-MCMC Non-normal data that allows for a normal-mixture representation Student-t distribution Logistic and Laplace (Binary regression) Block-MCMC with auxillary variables Non-normal data Poisson and others... Block-MCMC with GMRF-approximations

177 A brief introduction to MCMC MCMC Construct a Markov chain θ (1), θ (2),..., θ (k),... that converges (under some conditions) to π(θ). 1. Start with θ (0) where π(θ (0) ) > 0. Set k = Generate a proposal θ from some proposal kernel q(θ θ (k 1) ). Set θ (k) = θ with probability { } π(θ ) q(θ (k 1) θ ) α = min 1, π(θ (k 1) ) q(θ θ (k 1) ; ) otherwise set θ (k) = θ (k 1). 3. Set k = k + 1 and go back to 2

178 A brief introduction to MCMC Depending on the specific choice of the proposal kernel q(θ θ), very different algorithms result. When q(θ θ) does not depend on the current value of θ proposal is called an independence proposal. When q(θ θ) = q(θ θ ) we have a Metropolis proposal. These includes so-called random-walk proposals. The rate of convergence toward π(θ) and the degree of dependence between successive samples of the Markov chain (mixing) will depend on the chosen proposal.

179 A brief introduction to MCMC Single-site algorithms Most MCMC algorithms have been based on updating each scalar component θ i, i = 1,..., p of θ conditional on θ i. Apply the MH-algorithm in turn to every component θ i of θ with arbitrary proposal kernels q i (θ i θ i, θ i ) As long as we update each component of θ, this algorithm will converge to the target distribution π(θ).

180 A brief introduction to MCMC However... Such single-site updating can be disadvantageous if parameters are highly dependent in the posterior distribution π(θ). The problem is that the Markov chain may move around very slowly in its target (posterior) distribution. A general approach to circumvent this problem is to update parameters in larger blocks, θ j : a vector of components of θ. The choice of blocks are often controlled by what is possible to do in practise. Ideally, we should choose a small number of blocks with large dependence within the blocks but with less dependence between blocks.

181 Blocking Blocking Assume the following θ = (κ, x) where κ is the precision and x is a GMRF. Two-block approach sample x π(x κ), and sample κ π(κ x) Often strong dependence between κ and x in the posterior; resolved using a joint update of (x, κ). Why this modification is important and why it works is discussed next.

182 Blocking Rate of convergence Rate of convergence Let θ (1), θ (2),... denote a Markov chain with target distribution π(θ) and initial value θ (0) π(θ). Rate of convergence ρ: how quickly E(h(θ (t) ) θ (0) ) approaches the stationary value E(h(θ)) for all square π-integrable functions h( ). Let ρ be the minimum number such that for all h( ) and for all r > ρ [ ( ( lim E E h(θ (k) ) θ (0)) ) 2 E (h(θ)) r 2k] = 0. (76) k

183 Blocking Example Example Let x be a first-order autoregressive process x t µ = γ(x t 1 µ) + ν t, t = 2,..., n, (77) where γ < 1, {ν t } are iid normals with zero mean and variance σ 2, and x 1 N (µ, σ 2 1 γ 2 ). Let γ, σ 2, and µ be fixed parameters. At each iteration a single-site Gibbs sampler will sample x t from the full conditional π(x t x t ) for t = 1,..., n, N (µ + γ(x 2 µ), σ 2 ) t = 1, x t x t N (µ + γ σ 1+γ (x 2 t 1 + x t+1 2µ), 2 1+γ ) t = 2,..., n 1, 2 N (µ + γ(x n 1 µ), σ 2 ) t = n.

184 Blocking Example For large n, the rate of convergence is γ 2 ρ = 4 (1 + γ 2 ) 2. (78) For γ close to one the rate of convergence can be slow: If γ = 1 δ for small δ > 0, then ρ = 1 δ 2 + O(δ 3 ). To circumvent this problem, we may update x in one block. This is possible as x is a GMRF. This yields immediate convergence.

185 Blocking Example Relax the assumptions of fixed hyperparameters. Consider a hierarchical formulation where the mean of x t, µ, is unknown and assigned with a standard normal prior, µ N (0, 1) and x µ N (µ1, Q 1 ), where Q is the precision matrix of the GMRF x µ. The joint density of (µ, x) is normal. We have two natural blocks, µ and x.

186 Blocking Example A two-block Gibbs sampler update µ and x with samples from their full conditionals, ( ) µ (k) x (k) 1 T Qx (k 1) N T Q1, (1 + 1T Q1) 1 (79) x (k) µ (k) N (µ (k) 1, Q 1 ). The presence of the hyperparameter µ will slow down the convergence compared to the case when µ is fixed. Due to the nice structure of (79) we can characterise explicitly the marginal chain of {µ (k) }.

187 Blocking Example Theorem The marginal chain µ (1), µ (2),... from the two-block Gibbs sampler defined in (79) and started in equilibrium, is a first-order autoregressive process µ (k) = φµ (k 1) + ɛ k, where and ɛ k iid N (0, 1 φ 2 ). φ = 1T Q T Q1

188 Blocking Example Proof. It follows directly that the marginal chain µ (1), µ (2),... is a first-order autoregressive process. The coefficient φ is found by computing the covariance at lag 1, Cov(µ (k), µ (k+1) ) = E (µ (k) µ (k+1)) ( = E µ (k) E (µ (k+1) µ (k))) ( ( = E µ (k) E E (µ (k+1) x (k)) µ (k))) ( ( )) = E µ (k) 1 T Qx (k) E T Q1 µ(k) = 1 T Q T Q1 Var(µ(k) ), which is known to be φ times the variance Var(µ (k) ) for a first-order autoregressive process.

189 Blocking Example For our model φ = n(1 γ)2 /σ n(1 γ) 2 /σ 2 = 1 Var(x t) 1 γ 2 n (1 γ) 2 + O(1/n2 ). When n is large, φ is close to 1 and the chain will both mix and converge slowly even though we use a two-block Gibbs sampler. Note that increasing variance of x t improves the convergence. However, φ 1 as n, which is bad.

190 Blocking Example What is going on? (a) Trace of µ (k), and (b) the pairs (µ (k), 1 T Qx (k) ), with µ (k) on the horizontal axis.

191 Blocking One-block algorithm One-block algorithm So far: blocking improves mainly within the block. If there is strong dependence between blocks, the MCMC algorithm may still suffer from slow convergence. Update (µ, x) jointly by delaying the accept/reject step until x also is updated. µ q(µ µ (k 1) ) x µ N (µ 1, Q 1 ) (80) then accept/reject (µ, x ) jointly. Here, q(µ µ (k 1) ) can be a simple random-walk proposal or some other suitable proposal distribution.

192 Blocking One-block algorithm What is going on? (a) Trace of µ (k), and (b) the pairs (µ (k), 1 T Qx (k) ), with µ (k) on the horizontal axis.

193 Blocking One-block algorithm With a symmetric µ-proposal, { α = min 1, exp( 1 } 2 ((µ ) 2 (µ (k 1) ) 2 )). (81) Only the marginal density of µ is needed in (81): we effectively integrate x out of the target density. The minor modification to delay the accept/reject step until x is updated as well can give a large improvement. In this case: random walk on a one-dimensional density.

194 Block algorithms for hierarchical GMRF models General setup A more general setup Hyperparameters θ (low dimension) GMRF x θ of size n Observe x with data y. The posterior is π(x, θ y) π(θ) π(x θ) π(y x, θ). Assume we are able to sample from π(x θ, y), i.e., the full conditional of x is a GMRF.

195 Block algorithms for hierarchical GMRF models The one-block algorithm The one-block algorithm The following proposal update (θ, x) in one block: θ q(θ θ (k 1) ) x π(x θ, y). (82) The proposal (θ, x ) is then accepted/rejected jointly. We denote this as the one-block algorithm.

196 Block algorithms for hierarchical GMRF models The one-block algorithm Consider the θ-chain, then we are in fact sampling from the posterior marginal π(θ y) using the proposal θ q(θ θ (k 1) ) The dimension of θ is typically low (1-5, say). The proposed algorithm should not experience any serious mixing problems.

197 Block algorithms for hierarchical GMRF models The one-block algorithm The one-block algorithm is not always feasible The one-block algorithm is not always feasible for the following reasons: 1. The full conditional of x can be a GMRF with a precision matrix that is not sparse. This will prohibit a fast factorisation, hence a joint update is feasible but not computationally efficient. 2. The data can be non-normal so the full conditional of x is not a GMRF and sampling x using (82) is not possible (in general).

198 Block algorithms for hierarchical GMRF models The sub-block algorithm...not sparse precision matrix These cases can often be approached using sub-blocks of (θ, x), the sub-block algorithm. Assume a natural splitting exists for both θ and x into (θ a, x a ), (θ b, x b ) and (θ c, x c ), (83) The sets a, b, and c do not need to be disjoint. One class of examples where such an approach is fruitful is (geo-)additive models where a, b, and c represent three different covariate effects with their respective hyperparameters.

199 Block algorithms for hierarchical GMRF models GMRF approximations...non-gaussian full conditionals Auxillary variables can help achieving Gaussian full conditionals. logit and probit regression models for binary and multi-categorical data, and Student-t ν distributed observations. GMRF approximations can approximate π(x θ, y) using a second-order Taylor expansion. Prominent example: Poisson-regression. Such approximations can be surprisingly accurate in many cases and can be interpreted as integrating x approximately out of π(x, θ y).

200 Merging GMRFs Introduction Merging GMRFs using conditioning (I) µ N (0, 1) x µ µ AR(1) z x N (x, I) y z N (z, I) x = (µ, x, z, y) is a GMRF x y is a GMRF

201 Merging GMRFs Introduction Merging GMRFs using conditioning (I) µ N (0, 1) x µ µ AR(1) z x N (x, I) y z N (z, I) x = (µ, x, z, y) is a GMRF x y is a GMRF Additional hyperparameters θ

202 Merging GMRFs Introduction Merging GMRFs using conditioning (II)

203 Merging GMRFs Why is it so? Merging GMRFs using conditioning (III) If then x N (0, Q 1 ) y x N (x, K 1 ) z x, y N (y, H 1 ) Q + K K 0 Prec(x, y, z) = K + H H H which is sparse if Q, K and H are.

204 Part VI Applications

205 Outline I Normal data Munich rental data Normal-mixture response models Introduction Hierarchical-t formulations Binary regression Summary Non-normal response models GMRF approximation Joint analysis of diseases

206 Normal data Normal data: Munich rental guide Response variable y i : rent pr m 2 location floor space year of construction various indicator variables, such as no central heating no bathroom large balcony German law: increase in the rent is based on an average rent of a comparable flat.

207 Normal data Munich rental data Spatial regression model y i ) N (µ + x S (i) + x C (i) + x L (i) + z T i β, 1/κ y x S (i) Floor space (continuous RW2) x C (i) Year of construction (continuous RW2) x L (i) Location (first order IGMRF) β Parameter for indicator variables 380 spatial locations 2035 observations sum-to-zero constraint on spatial IGMRF model and the year of construction covariate

208 Normal data Munich rental data Inference using MCMC Sub-block approach (x S, κ S ), (x C, κ C ), (x L, κ L ), and (β, µ, κ y ). Update each block at a time, using κ L q(κ L κ L) x L, π(x L, the rest) and then accepts/rejects (κ L, xl, ) jointly.

209 Normal data Munich rental data Log-RW proposal for precisions It s convenient to propose new precisions using κ new = κ old f f π(f ) 1 + 1/f for f in [1/F, F ] and F > 1. With this choice q(κ old κ new ) q(κ new κ old ) = 1 Var(κ new κ old ) (κ old ) 2

210 Normal data Munich rental data Full conditional π(x L, the rest) Introduce fake data ỹ ỹ i = y i The full conditional of x L is π(x L the rest) exp( κ L 2 ( ) µ + x S (i) + x C (i) + z T i β, exp( κ y 2 (xi L xj L ) 2 ) i j k ( ỹ k x L (k)) 2). The data ỹ do not introduce extra dependence between the x L i s, as ỹ i acts as a noisy observation of x L i.

211 Normal data Munich rental data Denote by n i the number of neighbors to location i and let L(i) be where its size is L(i). L(i) = {k : x L (k) = x L i }, The full conditional of x L is a GMRF with parameters (Q 1 b, Q), where = κ y and b i k L i ỹ k κ L n i + κ y L(i) if i = j Q ij = κ L if i j 0 otherwise.

212 Normal data Munich rental data Results Floor space

213 Normal data Munich rental data Results Year of construction

214 Normal data Munich rental data Results Location

215 Normal-mixture response models Introduction Normal-mixture representation Theorem (Kelker, 1971) If x has density f (x) symmetric around 0, then there exist independent random variables z and v, with z standard normal such that x = z/v iff the derivatives of f (x) satisfy ( d dy ) k f ( y) 0 for y > 0 and for k = 1, 2,.... Student-t Logistic and Laplace

216 Normal-mixture response models Introduction Corresponding mixing distribution for the precision parameter λ that generates these distributions as scale mixtures of normals. Distribution of x Student-t ν Logistic Laplace Mixing distribution of λ G(ν/2, ν/2) 1/(2K) 2 where K is Kolmogorov-Smirnov distributed 1/(2E) where E is exponential distributed

217 Normal-mixture response models Hierarchical-t formulations RW1 with t ν increments Replace the assumption of normally distributed increments by a Student-t ν distribution to allow for larger jumps in the sequence x. Introduce n 1 independent G(ν/2, ν/2) scale mixture variables λ i : x i λ i iid N (0, (κλi ) 1 ), i = 1,..., n 1.

218 Normal-mixture response models Hierarchical-t formulations Observe data y i N (x i, κ 1 y ) for i = 1,..., n The posterior density for (x, λ) is Note that π(x, λ y) π(x λ) π(λ) π(y x). x (y, λ) is now a GMRF, while λ 1,..., λ n 1 (x, y) are conditionally independent gamma distributed with parameters (ν + 1)/2 and (ν + κ( x i ) 2 )/2. Use the sub-block algorithm with blocks (θ, x), λ

219 Normal-mixture response models Binary regression Example: Binary regression GMRF x and Bernoulli data y i B(g 1 (x i )) { g(p) = log(p/(1 p)) Φ(p) logit link probit link Equivalent representation using auxiliary variables w for the probit-link. ɛ i iid N (0, 1) w i = x i + ɛ { i 1 if w i > 0 y i = 0 otherwise.

220 Normal-mixture response models Binary regression MCMC Inference Full conditional for the GMRF prior is a GMRF Full conditional for the auxiliary variables π(w...) = i π(w i...) Use the sub-block algorithm with blocks (θ, x), w

221 Normal-mixture response models Binary regression Simple example: Tokyo rainfall data Stage 1 Binomial data y i { Binomial(2, p(x i )) Binomial(1, p(x i ))

222 Normal-mixture response models Binary regression Simple example: Tokyo rainfall data Stage 2 Assume a smooth latent x, x RW 2(κ), logit(p i ) = x i

223 Normal-mixture response models Binary regression Simple example: Tokyo rainfall data Stage 3 Gamma(α, β)-prior on κ

224 Normal-mixture response models Binary regression Simple example: Tokyo rainfall data Block Single-site Precision

225 Normal-mixture response models Binary regression Simple example: Tokyo rainfall data Block Single-site Probability

226 Normal-mixture response models Summary These auxiliary variables methods works better than one might think. Not that strong dependency between the blocks: (θ, x), w Nearly as random normal data.

227 Non-normal response models GMRF approximation Non-normal response models Example of full conditional ( π(x y) exp 1 2 xt Qx exp(x i ) i ( exp 1 ) 2 (x µ)t (Q + diag(c i ))(x µ) Construct GMRF approximation Locate the mode x Expand to second order To obtain the GMRF approximation The graph is unchanged )

228 Non-normal response models GMRF approximation Why use the mode?

229 Non-normal response models GMRF approximation How to find the mode? Numerical gradient-search methods evaluate the gradient and hessian in O(n) flops. faster for large GMRFs Newton-Raphson method Current x is initial value Taylor-expand around x Compute the mean in the GMRF-approximation Repeat this until convergence Further complications with linear constraints Ax = b.

230 Non-normal response models GMRF approximation Taylor or not? (I) Consider a non-quadratic function g(x) g(x) = g(x 0 ) + (x x 0 )g (x 0 ) (x x 0) 2 g (x 0 ) +... error is zero in x 0 error typically increase with x x 0

231 Non-normal response models GMRF approximation Taylor or not? (II) Alternative approximation where a, b and c are such that g(x) ĝ(x) = a + b cx 2 (84) g(x 0 ) = ĝ(x 0 ), g(x 0 ± h) = ĝ(x 0 ± h) More uniformly distributed error Better for distribution approximation...if h is a rough guess of ROI. Preference to numerical approxmations of derivatives compared to analytical expressions.

232 Non-normal response models Joint analysis of diseases Joint analysis of diseases SMR Oral SMR Lung

233 Non-normal response models Joint analysis of diseases Separate analysis: model y ij : number of cases in area i for cancer type j. y ij P(e ij exp(η ij )), η ij : log relative risk in area i for disease j. e ij : constants Decompose the log-relative risk η as µ is the overall mean η = µ1 + u + v u a spatially structured component (IGMRF) v an unstructured component (random effects)

234 Non-normal response models Joint analysis of diseases Separate analysis: model Posterior π(x, κ y) ( κ n/2 v κ (n 1)/2 u exp 1 ) 2 xt Qx ( ) exp y i η i e i exp(η i ) π(κ) Q = i κ µ + nκ v κ v 1 T κ v 1 T κ v 1 κ u R + κ v I κ v I κ v 1 κ v I κ v I Q is a 2n + 1 2n + 1 matrix where n = 544. The spatially structured term u has a sum-to-zero constraint, 1 T u = 0.

235 Non-normal response models Joint analysis of diseases Separate analysis: MCMC Joint update of all parameters: κ u q(κ u κ u ) κ v q(κ v κ v ) x π(x κ, y)

236 Non-normal response models Joint analysis of diseases Separate analysis: Results Oral Lung

237 Non-normal response models Joint analysis of diseases Joint analysis: Model Oral cavity and lung cancer relates to tobacco smoking (u 1 ) Oral cancer relates to alcohol consumption (u 2 ) Latent shared and specific spatial components: η 1 u 1, u 2, µ, κ N (µ δu 1 + u 2, κ 1 η 1 I) η 2 u 1, u 2, µ, κ N (µ δ 1 u 1, κ 1 η 2 I),

238 Non-normal response models Joint analysis of diseases Joint analysis: MCMC Update all parameters in one block Simple (log-)rw for the hyperparamteters Sample (µ 1, η 1, u 1, µ 2, η 2, u 2 ) from the GMRF approximation. Accepts/rejects jointly.

239 Non-normal response models Joint analysis of diseases Joint analysis: Results tobacco alcohol

Gaussian Markov Random Fields: Theory and Applications. Part I. Definition and basic properties

Gaussian Markov Random Fields: Theory and Applications. Part I. Definition and basic properties : Theory and Applications Håvard Rue 1 Part I Definition and basic properties 1 Department of Mathematical Sciences NTNU, Norway September 26, 2008 Outline I Why? Outline of this course What is a GMRF?

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

R-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14

R-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14 1 / 14 R-INLA Sam Clifford Su-Yun Kang Jeff Hsieh Bayesian Research and Analysis Group 30 August 2012 What is R-INLA? R package for Bayesian computation Integrated Nested Laplace Approximation MCMC free

More information

Bayesian multiscale analysis for time series data

Bayesian multiscale analysis for time series data Computational Statistics & Data Analysis 5 (26) 79 73 www.elsevier.com/locate/csda Bayesian multiscale analysis for time series data Tor Arne ]igård a,,håvard Rue b, Fred Godtliebsen c a Department of

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Markov random fields. The Markov property

Markov random fields. The Markov property Markov random fields The Markov property Discrete time: (X k X k!1,x k!2,... = (X k X k!1 A time symmetric version: (X k! X!k = (X k X k!1,x k+1 A more general version: Let A be a set of indices >k, B

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric

More information

GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT

GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT Cha Zhang Joint work with Dinei Florêncio and Philip A. Chou Microsoft Research Outline Gaussian Markov Random Field Graph construction Graph transform

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

An EM algorithm for Gaussian Markov Random Fields

An EM algorithm for Gaussian Markov Random Fields An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

Summary STK 4150/9150

Summary STK 4150/9150 STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets Is there spatial pattern? Chapter 3: Basics of Areal Data Models p. 1/18 Areal Unit Data Regular or Irregular Grids

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Inference and estimation in probabilistic time series models

Inference and estimation in probabilistic time series models 1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Monte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney

Monte Carlo simulation inspired by computational optimization. Colin Fox Al Parker, John Bardsley MCQMC Feb 2012, Sydney Monte Carlo simulation inspired by computational optimization Colin Fox fox@physics.otago.ac.nz Al Parker, John Bardsley MCQMC Feb 2012, Sydney Sampling from π(x) Consider : x is high-dimensional (10 4

More information

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Approaches for Multiple Disease Mapping: MCAR and SANOVA Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case Areal data models Spatial smoothers Brook s Lemma and Gibbs distribution CAR models Gaussian case Non-Gaussian case SAR models Gaussian case Non-Gaussian case CAR vs. SAR STAR models Inference for areal

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Stat. 758: Computation and Programming

Stat. 758: Computation and Programming Stat. 758: Computation and Programming Eric B. Laber Department of Statistics, North Carolina State University Lecture 4a Sept. 10, 2015 Ambition is like a frog sitting on a Venus flytrap. The flytrap

More information

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Lecture 1 and 2: Random Spanning Trees

Lecture 1 and 2: Random Spanning Trees Recent Advances in Approximation Algorithms Spring 2015 Lecture 1 and 2: Random Spanning Trees Lecturer: Shayan Oveis Gharan March 31st Disclaimer: These notes have not been subjected to the usual scrutiny

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Intrinsic products and factorizations of matrices

Intrinsic products and factorizations of matrices Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Solving linear systems (6 lectures)

Solving linear systems (6 lectures) Chapter 2 Solving linear systems (6 lectures) 2.1 Solving linear systems: LU factorization (1 lectures) Reference: [Trefethen, Bau III] Lecture 20, 21 How do you solve Ax = b? (2.1.1) In numerical linear

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018 Homework #1 Assigned: August 20, 2018 Review the following subjects involving systems of equations and matrices from Calculus II. Linear systems of equations Converting systems to matrix form Pivot entry

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

CONSTRAINED PERCOLATION ON Z 2

CONSTRAINED PERCOLATION ON Z 2 CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Lecture 2 INF-MAT 4350 2009: 7.1-7.6, LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Tom Lyche and Michael Floater Centre of Mathematics for Applications, Department of Informatics,

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Modeling and testing long memory in random fields

Modeling and testing long memory in random fields Modeling and testing long memory in random fields Frédéric Lavancier lavancier@math.univ-lille1.fr Université Lille 1 LS-CREST Paris 24 janvier 6 1 Introduction Long memory random fields Motivations Previous

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Spatial point processes in the modern world an

Spatial point processes in the modern world an Spatial point processes in the modern world an interdisciplinary dialogue Janine Illian University of St Andrews, UK and NTNU Trondheim, Norway Bristol, October 2015 context statistical software past to

More information