Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel

Size: px

Start display at page:

Download "Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel"

Andrea Cunningham
5 years ago
Views:

1 Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel K(x, y) = n π(y i y 1,..., y i 1, x i+1,..., x m ), i=1 and we set r(x) = 0. (move every time) This transition kernel does not in general satisfy the detailed balance equation, π(y)k(y, x) = π(x)k(x, y), but it satisfies the balance equation, π(y)k(y, x)dx = π(x)k(x, y)dx. 0-0

2 Proof in two dimensions π(y)k(y, x)dx = π(y) K(y, x)dx. We have and therefore K(x, y) = π(y 1 x 2 )π(y 2 y 1 ), K(y, x) = π(x 1 y 2 )π(x 2 x 1 ). Integrate with respect to x: K(y, x)dx = = = π(x 1 y 2 )π(x 2 x 1 )dx 1 dx 2 dx 1 π(x 1 y 2 ) π(x 2 x 1 )dx 2 } {{ } =1 π(x 1 y 2 )dx 1 =

3 Hence, π(y)k(y, x)dx = π(y). Right hand side: so π(x)k(x, y) = π(x)π(y 1 x 2 )π(y 2 y 1 ), π(x)k(x, y)dx 1 = π(y 1 x 2 )π(y 2 y 1 ) π(x 1, x 2 )dx 1 } {{ } =π(x 2 ) = π(y 1 x 2 )π(x 2 )(π(y 2 y 1 ) }{{} =π(y 1,x 2 ) = π(y 1, x 2 )π(y 2 y 1 ). 0-2

4 Integrating with respect to x 2, we obtain π(y 1, x 2 )π(y 2 y 1 )dx 2 = π(y 2 y 1 ) π(y 1, x 2 )dx 2 } {{ } π(y 1 ) and the proof is complete. = π(y 2 y 1 )π(y 1 ) = π(y 2, y 1 ) = π(y), 0-3

5 Algorithm Componentwise updating: 1. Initialize x = x 1 and set k = Update x k x k+1 : Draw x k+1 1 from t π(t, x k 2, x k 3,, x k n), Draw x k+1 2 from t π(x k+1 1, t, x k 3,, x k n),.. Draw x k+1 n from t π(x k+1 1 x k+1 2,, x k+1 n 1, t). 3. Increase k k + 1 and repeat from 2. until a desired sample size is reached. 0-4

6 Consider the following particular case: Implementation: an example Observation model b = Ax + e, e N (0, σ 2 I). Whitening of the noise: Observe that w = 1 e N (0, I), σ so the observation equation is equivalent to b = A x + w, w N (0, I), where A = 1 σ A, b = 1 σ b. 0-5

7 Without loss of generality, we may assume therefore that σ = 1. Likelihood density is π(b x) exp ( 12 ) Ax b 2. Prior: assume that we have an a priori inequality constraint Cx r, the inequality understood componentwise. Example: v j x j u j, 1 j n, can be written as [ ] [ ] I v x. I u }{{}}{{} =C =r 0-6

8 The prior can be written as π prior (x) Θ(Cx r), where Θ is the multivariate Heaviside step function. Observe: the prior may be an improper density, i.e., it may be that the integral is not finite. Write π post (x) = π(x b) π(b x)π prior (x). 0-7

9 Updating of the jth component: we need the conditional densities π(x j x 1, x 2,..., x j 1, x j+1,..., x n, b) }{{}}{{} updated old that is, we have to consider the mapping x j π post (x 1, x 2,..., x j 1, x j, x j+1,..., x n ), }{{}}{{} updated old where all the other components except for the jth one are fixed. 0-8

10 Likelihood: Write Then Denote A = [ ] a 1 a 2 a n n Ax = x k a k. k=1 A j = matrix A with jth column eliminated, a j = jth column of A, x j = vector x with the jth entry eliminated. 0-9

11 We have Ax b = x j a j + n k=1,k j x k a k b = x j a j + A j x j b = x j a j b j, where b j = b A j x j. 0-10

12 Now, Ax b 2 = x j a j b j 2 Therefore, by denoting = a j 2 x 2 j 2x j a T j b j + b j 2 = = ( a j x j at j b j a j ) 2 + b j 2 (at j b j) 2 a j 2. t j = a j x j, t j = at j b j a j, we have π(x j x j, b) exp ( 12 ) (t j t j )

13 Prior, i.e., the bounds for t j, assuming that x j is given: Again, we write C j = matrix C with jth column eliminated, c j = jth column of C, so the bound constraints Cx = x j c j + C j x j r implies x j c j r C j x j, and by scaling with a j, we have t j c j q, q = a j ( ) r C j x j. (1) 0-12

14 Denote c j = c 1j c 2j.. c Nj RN. Make a permutation of the elements c ij of c j and q so that the C ij s are in decreasing order. Assume that the l first elements are positive, c 1j c lj > 0, while the entries starting from the k + 1, k l are negative, 0 > c k+1,j c Nj. 0-13

15 Writing the inequality (1) component by component and taking the signs into account, we obtain and c ij t j q i, or t j q i c ij, 1 i l, c ij t j q i, or t j q i c ij, k + 1 i N. In addition, one should check that the inequalities corresponding to zero entries are valid, that is, 0 r i, l + 1 i k. This is a consistency check, and has no contribution to the sampling strategy. 0-14

16 We therefore have lower and upper bounds for t j, ( ) ( ) qi qi t j,min = max, t j,max = min. 1 i l c ij k+1 i n c ij The conditional probability density of t j is π(t j x j, b) exp ( 12 ) (t j t j ) 2, t j,min t j t j,max, and the random draw has to be done from this density. 0-15

17 To do the draws properly, we have to consider three possibilities, each one treated below separately. 1. t j > t j,max. This means that we have to draw from the left tail of the Gaussian distribution. The maximum value of this tail is achieved at t j = t j,max. We scale the density so that this maximum value equals one: ( π(t j ) = exp 1 ) 2 (t j t j ) 2 + p, p = 1 2 (t j,max t j ) 2. We seek the effective interval where this density is bigger than a prescribed threshold value δ >

18 Write π(t j ) = δ, take logarithm of both sides to obtain 1 2 (t j t j ) 2 p = log 1 δ, and solve for t j, bearing in mind that t j < t j, t j = t = t j Hence, the effective draw interval is ( 2p + 2 log 1 δ ) 1/2. max(t j,min, t ) t j t j,max. 0-17

19 2. t j < t j,min. This time, we need to draw from the right tail of the distribution. The maximum is attained at t j = t j,min, and the scaled density now is ( π(t j ) = exp 1 ) 2 (t j t j ) 2 + p, p = 1 2 (t j,min t j ) 2. Again, we seek the effective interval, that in this time is t j,min t j min(t, t j,max ), where t = t j + ( 2p + 2 log 1 δ ) 1/

20 3. t j,min t j t j,max, the maximum thus being within the interval. In this case, the scaled density is directly π(t j ) = exp ( 12 ) (t j t j ) 2, and solving the equation leads to the solutions π(t j ) = δ t j = t ± = t j ± so the active interval for t j in this case is ( 2 log 1 δ ) 1/2, max(t j,min, t, ) t j min(t j,max, t,+ ). Assume now that we have updated the interval to be the effective interval. 0-19

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be