Likelihood Inference for Lattice Spatial Processes

Size: px

Start display at page:

Download "Likelihood Inference for Lattice Spatial Processes"

Anis Price
6 years ago
Views:

1 Likelihood Inference for Lattice Spatial Processes Donghoh Kim November 30, 2004 Donghoh Kim 1/24 Go to FULL

2 Lattice Processes Model : The Ising Model (1925), The Potts Model (1952), The Unitary Cell Model (1991) Phase Transition and Critical Value Likelihood Inference Simulation Donghoh Kim 2/24 Go to FULL

3 Lattice Processes Model : The Ising Model (1925), The Potts Model (1952), The Unitary Cell Model (1991) Phase Transition and Critical Value Likelihood Inference Simulation Donghoh Kim 2/24 Go to FULL

4 Lattice Spatial Processes Lattice i 1 i 2 i 3 i 4 i 5 i 6 i 7 i 8 i 9 β j jk k β jl l X i on each vertex takes q distinct values. neighbors : i 2, i 4, i 6, i 8 are nearest-neighbors of i 5 and i 1, i 3, i 5, i 7 are second-order-neighbors of i 5. Interaction : β ij between vertices. Lattice process models : unnormalized exponential family density on m n lattice. Donghoh Kim 3/24 Go to FULL

5 The Ising Model (Ising, 1925) A random variable X i takes two values in { 1, 1}. Only nearest-neighbors are considered. The one-parameter Ising model h(x) = exp(β t 2 (x)), β R. where t 2 (x) = i j x ix j, where means neighbor. t 2 (x) : no. of concordant pairs of variables minus no.of discordant pairs of variables. The two-parameter Ising model (Pickard, 1977) where h(x) = exp(β h t 2h (x) + β v t 2v (x)), t 2h (x) = i, j x ij x i(j+1), t 2v (x) = i, j x ij x (i+1)j Different strength of interaction between vertical nearest-neighbors and horizontal nearest-neighbors. Donghoh Kim 4/24 Go to FULL

6 2 2 Unitary Cell Model (Aguilar and Braun, 1991a) A 2 2 unitary cell : 2 2 small lattice with 8 interactions, repeated m n times. θ 11 θ 12 θ 11 θ 12 γ 11 θ 21 θ 22 θ 21 θ 22 γ 21 θ 11 θ 12 θ 11 θ 12 γ 11 γ 12 γ 12 γ 11 γ 11 γ 12 γ 12 θ 21 θ 22 θ 21 θ 22 γ 21 γ 22 γ 22 γ 21 γ 21 γ 22 γ X i takes only two values { 1, 1 }. h(x) = exp θ [i][j] x ij x i(j+1) + γ [i][j] x ij x (i+1)j i, j Includes the Ising model, lattice model, etc. by transforming the shape of the unitary cell and manipulating the interaction terms. Donghoh Kim 5/24 Go to FULL

7 Why Lattice Process Model? Spatial lattice data : plant ecology data (Besag 1974, Cressie 1993). Bayesian image restoration (Geman and Geman, 1984) : Lattice process prior Phase transitions. Ising model for magnetism Magnet consists of a large number of particles and each particle has two states N and S. At high temperature, loses magnetism and at low temperature, magnetized. Magnetisms occur suddenly (phase transitions) as the parameter value passes through a specific value, called the critical value (β = ). In statistical mechanics β = 1 kt, k; Boltzmann constant, T; Temperature. Donghoh Kim 6/24 Go to FULL

8 Phase Transitions and Critical Values The one-parameter Ising model (Onsager, 1944) : 0.5 sinh 1 (1) = The two-parameter Ising model (Pickard, 1977) : {(β h,β v ) : β h = 0.5tanh 1 (cos(ψ)), β v = 0.5tanh 1 (sin(ψ)),ψ (0, π/2)}. 2 2 Symmetric hexagonal lattice model (Aguilar and Braun, 1991b) : all six parameters are sinh 1 ( 3)/2 Donghoh Kim 7/24 Go to FULL

9 Likelihood Inference of Lattice Process Unormalized exponential family density : h θ (x), θ R k Normalizing constant : c(θ) = h θ (x) d(x) Normalized density : 1 c(θ) h θ(x) Problem c(θ) is unknown, so likelihood function is also unknown. Asymptotical distribution of MLE : Asymptotically calculate c(θ). Use Markov Chain Monte Carlo (MCMC) to approximate c(θ) (umbrella sampling). Donghoh Kim 8/24 Go to FULL

10 Asymptotic Distribution of MLE The one parameter Ising model When N = m n lattice size, β = 0.5 sinh 1 (1), ) ( 1 D (t 2(X N ) Et 2(X N ) N 0, 4 ). N log N π and ( N log N (ˆβ n β) D N 0, π ). 4 The two-parameter Ising model (Pickard, 1977) The critical surface : β h = 0.5 tanh 1 (cos(ψ)), β v = 0.5 tanh 1 (sin(ψ)) for ψ (0, π/2). where 1 N log N ( t2h (X N ) E t 2h (X N ) t 2v(X N ) E t 2v(X N ) Σ = 1 π ) ( tan ψ 1 1 cot ψ D N(0, Σ) ) Donghoh Kim 9/24 Go to FULL

11 Approximation of the Likelihood Function by MCMC A family {h θ : θ Θ} of unnormalized densities on state spacex. Normalizing constant c(θ) is unknown. The log likelihood : for an observation x, l(θ), ( ) 1 l(θ) = log c(θ) h θ(x) = log h θ (x) log c(θ). Take ratio against an arbitrary unnormalized density g. where c g = g(x)dx l(θ) = log h θ(x) g(x) log c(θ) c g, Domination Condition : Support of g contains support of h θ for all θ. c(θ) c g = E g h θ (X) g(x). Donghoh Kim 10/24 Go to FULL

12 Approximation of the Likelihood Function by MCMC Approximation : Generate an irreducible Markov chain X 1,...,X n from g, 1 n n i=1 h θ (X i ) g(x i ) a. s. c(θ). c g ( l(θ) l n (θ) = log h θ(x) g(x) log 1 n n i=1 Call g the background density and its sample the background sample. ) h θ (X i ). g(x i ) Maximization : Call MLE of approximated likelihood, Monte Carlo MLE (MCMLE). What distribution for g : any distribution as long as domination condition holds (cover large state space). Donghoh Kim 11/24 Go to FULL

13 Approximation of the Likelihood Function by MCMC Mixture Methods Simulated tempering (ST) (Marinari and Parisi,1992; Geyer and Thompson, 1995) Umbrella sampling (US) (Torrie and Valleau, 1977) To spread background samples out To approximate the likelihoods for a wide range of parameter values Move around finite set of distributions.... Donghoh Kim 12/24 Go to FULL

14 Approximation of the Likelihood Function by MCMC Mixture Methods The new state space pair (X, I), I indicating i-th unnormalized densities. Conditional on I = i, X has unnormalized density h ψi. Hence the joint distribution must be of the form h(x, i) = h ψi (x)a i where the a i are some constants called pseudo-prior. h(x, i) is used as background density. Donghoh Kim 13/24 Go to FULL

15 Approximation of the Likelihood Function by MCMC Mixture Methods One-parameter case (Geyer and Thompson, 1995) 1. Given the m distributions h ψ1,...,h ψm Update X h(x I n = i) = 1 c(ψ i ) h ψ i (x) Already have updating scheme for h ψi (x), Metropolis-Hastings or Gibbs. Update I using a Metropolis-Hastings. Proposal density q(i, j) for I, q(i, j) = q ij, j = i ± Add more distributions and repeat (1). Donghoh Kim 14/24 Go to FULL

16 Approximation of the Likelihood Function by MCMC Mixture Methods How to choose pseudo-prior a i The unnormalized marginal density of I is h(x, i)dx = c(ψ i )a i, c(ψ i ) = h ψi (x)dx. If pseudo-prior a i = 1/c(ψ i ), the marginal density of I is uniform. Any a i is working, but when uneven, sampler does not mix well. Geyer and Thompson (1995) adjust a i as a i /o i where o i is the occupation number of the i-th unnormalized density in an MCMC run. How to choose the m distributions Arrange m distributions according to parameter value and call spacing the distance between parameters. Adjusting spacing, and adding more distributions in an MCMC run so that the acceptance rates in Metropolis-Hastings updates for I are between 20% and 40%. Donghoh Kim 15/24 Go to FULL

17 Approximation of the Likelihood Function by MCMC Mixture Methods Multi-parameter case : Open Problem How to choose m distributions? There is no natural ordering of distributions How many Distributions? Need large number of distributions. Difficult to adjust the a i using the method of Geyer and Thompson (1995) Torrie and Valleau (1977) did not give any clear proposals. Donghoh Kim 16/24 Go to FULL

18 Umbrella sampling 1. Given mixture distribution h ψ1,..., h ψm in the model, and Given a i 2. Generate preliminary background samples Update state space X given i. h(x i) = h ψi (x) Update I given x by Gibbs update. h(i x) = h ψi (x) a i m k=1 h ψ k (x) a k. 3. Tuning pseudo-prior a k : Adjusting nc by mixture distribution itself and a k = 1/c(ψ k ) 1 n n i=1 h ψk (X Ik ) h ψii (X Ii )a Ii 4. Generate mixture background samples. 5. If satisfy, stop. Otherwise, add more distributions and repeat step 2. Donghoh Kim 17/24 Go to FULL

19 Umbrella sampling 2000 observations x from the symmetric hexagonal lattice model h θ. All six parameters are sinh 1 ( 3)/ Quantiles of Chi-Square Distribution with 6 d.f. Quantile plot of (t(x) Et(X)) ˆK 1 (t(x) Et(X)) If t(x) close to a normal, then the constrained MLE ˆθ close to N(θ, Σ) and Q close to χ 2 with 5 d.f., where Q = (ˆθ θ) Σ (ˆθ θ) Donghoh Kim 18/24 Go to FULL

20 Approximate likelihood by g. ( l(θ) l n (θ) = log h θ(x) g(x) log 1 n n i=1 ) h θ (X i ). g(x i ) MCMLE ˆθ satisfying constraint (critical equation) and where l n (θ) = t(x) n t(x i )w θ (X i ) = 0, i=1 w θ (x) = h θ(x) g(x) / n i=1 h θ (X i ) g(x i ). If an observation is on the boundary of the convex hull, the MCMLE does not exist. Donghoh Kim 19/24 Go to FULL

21 Umbrella sampling In the first run, 36 distributions are chosen up to 60 percentile of Q. MCMLE does not exist for 10 observations. In the second run, adding 20 more distributions between 60 and 80 percentile of Q. MCMLE dose not exist for 5 observations. In the third run, adding 45 more distributions from 80 to 95 percentile of Q. Why 60, 80, 95 percentile? not too far away from previous mixture : difficult to estimate the normalizing constant since the state space of new mixture is not covered. not too close to previous mixture :convex hull of the mixture is not spread out. As long as sampler mixes well. Donghoh Kim 20/24 Go to FULL

22 Quantiles of Chi-Square Distribution with 5 d.f. Quantile-Quantile plot of Q, Donghoh Kim 21/24 Go to FULL

23 Markov Chain Monte Carlo (MCMC) Use a Markov Chain X 0, X 1,... satisfying P(X n+1 X 0, X 1,...,X n ) = P(X n+1 X n ). Distribution of a Markov chain is defined through an initial distribution ν, i.e. the distribution of X 0 and transition probability P(, ). π is called invariant if π(a) = π(dx)p(x, A), for A S. Construct P(x, A) so that the Markov chain has π as its unique invariant distribution. Random vector X = (X 1, X 2,...,X k ) in State space S = S 1 S k with distribution π w.r.t. µ = µ 1 µ k. Let h be an unnormalized density w.r.t. the µ. Let X 1 be the variable to be updated and let X j = (X 1,...,X j 1, X j+1,...,x k ). Donghoh Kim 22/24 Go to FULL

24 Markov Chain Monte Carlo (MCMC) The Gibbs update : Update X 1 by π( X 1 ). The Metropolis-Hastings update : Current state x = (x 1,x 2,...,x k ) S. The proposal density q 1 (x, ) propose a candidate value y 1 S 1. Proposal y is (y 1,x 2,...,x k ). y is accepted with probability a(x 1,y 1 ), where a(x 1,y 1 ) = min(1,r(x 1,y 1 )), R(x 1,y 1 ) = h(y)q 1(y,x 1 ) h(x)q 1 (x,y 1 ) Hastings Ratio. The Metropolis-Hastings update may reject a new proposal whereas the Gibbs update always accepts a new proposal. Donghoh Kim 23/24 Go to FULL

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative