Markov Chain Monte Carlo

Size: px

Start display at page:

Download "Markov Chain Monte Carlo"

Sheena Horn
5 years ago
Views:

1 Markov Chain Monte Carlo Jamie Monogan University of Georgia Spring 2013 For more information, including R programs, properties of Markov chains, and Metropolis-Hastings, please see: Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

2 Objectives By the end of this meeting, participants should be able to: Use WinBUGS to estimate a model. Describe the properties of Markov Chains. Explain the algorithm behind the Gibbs sampler and apply it to data analysis. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

3 Using WinBUGS Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

4 Specifying Models with BUGS BUGS Vocabulary: node: values and variables in the model, specified by the researcher. parent: a node with direct influence on other nodes. descendent: the opposite of a parent node, but also can be a parent. constant: a founder node, they are fixed and have no parents. stochastic: a node modelled as a random variable (parameters or data). deterministic: logical consequences of other nodes. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

5 Linear BUGS Example Consider the following economic data from Organization for Economic Cooperation and Development (OECD) that highlight the relationship between commitment to employment protection measured on an interval scale (0 to 4) indicating the quantity and extent of national legislation to protect jobs, and the total factor productivity difference in growth rates between and (see The Economist, September 23, 2000 for a discussion). Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

6 Linear BUGS Example (cont.) Prot. Prod. Prot. Prod. United States Canada Australia New Zealand Ireland Denmark Finland Austria Belgium Japan Sweden Netherlands France Germany Greece Portugal Italy Spain Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

7 Linear BUGS Example (cont.) We know from Gauss-Markov theory that the posterior distribution of both the intercept and the slope coefficients is student s-t with n k 1 = 17 degrees of freedom. So why are we running BUGS on a linear model? Consider how different the estimation process is here: ˆβ = (X X) 1 X y versus α 1 f (α β 0 ), β 1 f (β α 1 ) α 2 f (α β 1 ), β 2 f (β α 2 ) : : α m f (α β m 1 ), β m f (β α m ). Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

8 Linear BUGS Example (cont.) First define the statistical structure of the model: mu[ i ] < a l p h a + beta x [ i ] ; y [ i ] dnorm (mu[ i ], tau ) ; Note that we are indexing across the data here (not chaining!). Now define the variables in the model and their distributional assumptions: a l p h a dnorm ( 0. 0, ) ; beta dnorm ( 0. 0, ) ; tau dgamma ( 0. 1, 0. 1 ) ; The second normal parameter is a precision, not a variance, by convention. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

9 Linear BUGS Example (cont.) Data is handled in a data statement for BUGS : data x, y i n oecd. dat ; i n i t s i n oecd. i n ; and generally at the bottom of the source file for WinBUGS : l i s t ( x=c ( , , , , , , , , , , , , , , , , , ), y= c ( 0. 5, 0. 6, 1. 3, 0. 4, 0. 1, 0. 9, 0. 7, 0. 1, 0. 4, 0. 4, 0.5, 0.6, 0.9, 0.2, 0.3,0.3, 0.3, 1.5), N=18) l i s t ( a l p h a = 0. 0, beta = 0. 0, tau = 1. 0 ) Note the R-like handling of data. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

10 Linear BUGS Example (cont.) Looping through the data (which can be much more complex) is also done in a very R -like manner: Programming notes: f o r ( i i n 1 :N) { } mu[ i ] < a l p h a + beta x [ i ] ; y [ i ] dnorm (mu[ i ], tau ) ; In Unix all statements must end with ;, not true in WinBUGS. The order of the distributional statements and the logical looping doesn t matter. The production of chain values is not stipulated by the user in the code. It is good practice to use defined constants to size vectors and matrices (N = 18 here) rather than embed integers in var definitions. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

11 Linear BUGS Example (cont.) model oecd ; { } f o r ( i i n 1 :N) { } mu[ i ] < a l p h a + beta x [ i ] ; y [ i ] dnorm (mu[ i ], tau ) ; a l p h a dnorm ( 0. 0, ) ; beta dnorm ( 0. 0, ) ; tau dgamma ( 0. 1, 0. 1 ) ; l i s t ( x=c ( , , , , , , , , , , , , , , , , , ), y=c ( 0. 5, 0. 6, 1. 3, 0. 4, 0. 1, 0. 9, 0. 7, 0. 1, 0. 4, 0. 4, 0. 5, 0.6, 0.9, 0.2, 0.3, 0. 3, 0.3, 1.5), N=18); l i s t ( a l p h a = 0. 0, beta = 0. 0, tau = 1. 0 ) ; Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

12 Linear BUGS Example (cont.) The next steps are to compile the model in BUGS and run the chain, recording values. The following steps are given for the Unix implementation where each command corresponds to a specific button in WinBUGS. Compile: Bugs>c o m p i l e ( oecd. bug ) Run the chain for a burn-in period: Bugs>update (10000) time f o r updates was 0 0 : 0 0 : 0 1 Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

13 Linear BUGS Example (cont.) Turn on chain value recording: Bugs>monitor ( a l p h a ) Bugs>monitor ( beta ) Run the chain for a much longer series of values: Bugs>update (50000) time f o r updates was 0 0 : 0 0 : 0 5 Ask for summary statistics: Bugs>s t a t s ( a l p h a ) Bugs>s t a t s ( beta ) Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

14 Linear BUGS Example (cont.) Using the posterior mean as a point estimate, we can compare with lm: OLS Model MCMC Posterior Estimate Std. Error Mean Std. Dev. (Intercept) Slope Observations? Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

15 Details on WinBUGS Minor stuff: WinBUGS has lots of bells and whistles to explore, such as running the model straight from the doodle. The data from any plot can be recovered by double-clicking on it. Setting the seed may be important to you: leaving the seed as is exactly replicates chains. Other features: Encoding/Doodling documents. Fonts/colors in documents. Log files to summarize time and errors. Fancy windowing schemes. Note that many WinBUGS programs are saved as.odc files, an omnibus format. Full programs can be saved as plain text also. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

Specification Tool Window check model: checks the syntax of your code. load data: loads data from same or other file. num of chains: sets number of parallel chains to run.

16 Specification Tool Window check model: checks the syntax of your code. load data: loads data from same or other file. num of chains: sets number of parallel chains to run. compile: compiles your code as specified. load inits: loads the starting values for the chain(s). gen inits: lets WinBUGS specify initial values. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

17 Update Window updates: you specify the number of chain iterations to run this cycle. refresh: the number of updates between screen redraws for traceplots and other displays. update: hit this button to begin iterations. thin: number of values to thin out of chain between saved values. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

18 Update Window (cont.) iteration: current status of iterations, by UPDATE parameter. over relax: click in the box for option to generate multiple samples at each cycle, pick sample with greatest negative correlation to current value. Trades cycle time for mixing qualities. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

19 Update Window (cont.) adapting: box will be automatically clicked while the algorithm for Metropolis or slice sampling (using intentionally introduced auxiliary variables to improve convergence and mixing) still tuning optimization parameters (4000 and 500 iterations, respectively). Other options greyed out during this period. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

20 Sampling Window node: sets each node of interest for monitoring; type name and click SET for each variable of interest. Use the * in the window when you are done to do a full monitor. chains: 1 to 10, sets subsets of chains to monitor if multiple chains are being run. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

21 Sampling Window beg, end: the beginning and ending chain values current to be monitored. BEG is 1 unless you know the burn-in period. thin: yet another opportunity to thin the chain. clear: clear a node from being monitored. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

22 Sampling Window trace: do dynamic traceplots for monitored nodes. history: display a traceplot for the complete history. density: display a kernel density estimate. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

23 Sampling Window quantiles: displays running mean with running 95% CI by iteration number. auto cor: plots of autocorrelations for each node with lags 1 to 50. coda: display chain history in window in CODA format, another window appears with CODA ordering information. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

Sampling Window stats: summary statistics on each monitored node using: mean, sd, MC error, current iteration value, starting point of chain and percentiles from PERCENTILES window.

24 Sampling Window stats: summary statistics on each monitored node using: mean, sd, MC error, current iteration value, starting point of chain and percentiles from PERCENTILES window. Notes on stats: WinBUGS regularly provides both: naive SE = sample variance/ n and: MC Error = spectral density var/ n = asymptotic SE. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

25 Theory Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

26 What is a Markov Chain? A type of stochastic process that will help us estimate posterior quantities. A stochastic process is a consecutive set of random quantities defined on some known state space, Θ, indexed so that the order is known: {θ [t] : t T }. Frequently, but not necessarily, T is the set of positive integers implying consecutive, even-spaced time intervals: {θ [t=0], θ [t=1], θ [t=2],...}. A stochastic process must also be defined with respect to a state space, Θ, which identifies the range of possible values of θ. This state space is either discrete or continuous depending on how the variable of interest is measured. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

27 What is a Markov Chain? (cont.) A Markov chain is a stochastic process with the property that any specified state in the series, θ [t], is dependent only the previous value of the chain, θ [t 1]. Therefore values are conditionally independent of all other previous values: θ [0], θ [1],..., θ [t 2]. Formally: P(θ [t] A θ [0], θ [1],..., θ [t 2], θ [t 1] ) = P(θ [t] A θ [t 1] ), where A is any identified set (an event or range of events) on the complete state space. (We will use this A notation extensively.) Colloquially: A Markov chain wanders around the state space remembering only where it has been in the last period. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

28 What is a Markov Chain? (cont.) This short-term memory property is very useful because when the chain eventually finds the region of the state space with highest density, it will wander around there producing a sample that is only modestly nonindependent. If this is the posterior region, then we can use these empirical values as legitimate posterior sample values. Thus difficult posterior calculations can be done with MCMC by letting the chain wander around sufficiently long, thus producing summary statistics from recorded values. Sounds simple. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

29 What is a Markov Chain? (cont.) How does the Markov chain decide to move? Define the transition kernel, K, as a general mechanism for describing the probability of moving to some other specified state based on the current chain status. K(θ, A) is a defined probability measure for all θ points in the state space to the set A Θ. So K(θ, A) maps potential transition events to their probability of occurrence. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

30 What is a Markov Chain? (cont.) When the state space is discrete, K is a matrix mapping, k k for k discrete elements in A, where each cell defines the probability of a state transition from the first term to all possible states: P A = p(θ 1, θ 1 )... p(θ 1, θ k ) : : p(θ k, θ 1 )... p(θ k, θ k ) where the row indicates where the chain is at this period and the column indicates where the chain is going in the next period. Each matrix element is a well-behaved probability, p(θ i, θ j ) 0, i, j A. When the state space is continuous, then K is a conditional PDF: f (θ θ i ). Rows of P A sum to one and define a conditional PMF since they are all specified for the same starting value and cover each possible destination in the state space: for row i: k j=1 p(θ i, θ j ). Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

31 What is a Markov Chain? (cont.) When the state space is continuous, then K is a conditional PDF: f (θ θ i ). K is a conditional PDF: f (θ θ i ), meaning a properly defined probability statement for all θ A, given some given current state θ i. Continuous state space Markov chains have more involved theory; so it s often convenient to think about discrete Markov chains at first. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

32 What is a Markov Chain? (cont.) p m (θ [0] i Transition probabilities between two selected states for arbitrary numbers of steps m can be calculated multiplicatively. The probability of transitioning from the state θ i = x at time 0 to the state θ j = y in exactly m steps is given by the multiplicative series: = x, θ [m] j = y) = p(θ i, θ 1 )p(θ 1, θ 2 ) p(θ m 1, θ j ). θ 1 θ 2 θ m 1 }{{} all possible paths } {{ } transition products So p m (θ [0] i = x, θ [m] j = y) is also a stochastic transition matrix that specifies the product of all the required intermediate steps where we sum over all possible paths that reach y from x. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

33 Marginal Distributions We want the marginal distribution at some step mth from the transition kernel. For the discrete case the marginal distribution of the chain at the m step is obtained by inserting the current value of the chain, θ [m] i, into the row of the transition kernel for the m th step, p m : π m (θ) = [p m (θ 1 ), p m (θ 2 ),..., p m (θ k )]. So the marginal distribution at the first step of discrete Markov chain is given by: π 1 (θ) = p 1 π 0 (θ), where π 0 is the initial starting value assigned to the chain and p 1 = p is a transition matrix. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

34 Marginal Distributions (cont.) The marginal distribution at some (possibly distant) step for a given starting value is: π n = pπ n 1 = p(pπ n 2 ) = p 2 (pπ n 3 ) =... = p n π 0. Since successive products of probabilities quickly result in lower probability values, the property above shows how Markov chains eventually forget their starting points. The marginal distribution for the continuous case is only slightly more involved since we cannot just list as a vector the quantity: π m (θ j ) = p(θ, θ j )π m 1 (θ)dθ, θ which is the marginal distribution of the chain, given that it is currently on point θ j at step m. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

35 Properties Markov Chains May Possess Some Good, Some Bad Homogeneity Irreducibility Recurrence Harris recurrence Stationarity Periodicity Ergodicity Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

36 The Gibbs Sampler The Gibbs sampler is a transition kernel created by a series of full conditional distributions. It is a Markovian updating scheme based on conditional probability statements. If the limiting distribution of interest is π(θ) where θ is an k length vector of coefficients to estimate, then the objective is to produce a Markov chain that cycles through these conditional statements moving toward and then around this distribution. The set of full conditional distributions for θ are denoted Θ and defined by π(θ) = π(θ i θ i ) for i = 1,..., k, where the notation θ i indicates a specific parametric form from Θ without the θ i coefficient. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

37 The Gibbs Sampler (cont.) Steps: 1 Choose starting values: θ [0] = [θ [0] 1, θ[0] 2,..., θ[0] k ] 2 At the j th starting at j = 1 complete the single cycle by drawing values from the k distributions given by: θ [j] 1 π(θ 1 θ [j 1] 2, θ [j 1] 3,..., θ [j 1] k 1, θ[j 1] k ) θ [j] 2 π(θ 2 θ [j] 1, θ[j 1] 3,..., θ [j 1] k 1, θ[j 1] k ) θ [j] 3 π(θ 3 θ [j] 1, θ[j] 2,..., θ[j 1] k 1, θ[j 1] k ). θ [j] k 1 π(θ k 1 θ [j] 1, θ[j] 2, θ[j] 3..., θ[j 1] k ) θ [j] k π(θ k θ (j) 1, θ[j] 2, θ[j] 3..., θ[j] k 1 ) 3 Increment j and repeat until convergence. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

38 Gibbs Sampler Theory Properties of the Gibbs sampler: Since the Gibbs sampler conditions only on values from the last iteration of its chain values, it clearly has the Markovian property. The Gibbs sampler has the true posterior distribution of the parameter vector as its limiting distribution: θ [i] d θ π(θ). i=1 The Gibbs sampler is a homogeneous Markov chain: the consecutive probabilities are independent of n, the current length of the chain. The Gibbs sampler converges at a geometric rate: the total variation distance between an arbitrary time and the point of convergence decreases at a geometric rate in time (t). The Gibbs sampler is ergodic. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

39 Comments on Burn-in The burn-in period is the initial time that is considered to be pre-stationarity. We run the chain for some time after the starting point and throw away the values. Convergence assessment is essential (diagnostics to come). It pays to be conservative in deciding the length of the burn-in period. There is no golden rule here. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

40 For March 20 Present me with a map of the data for your final project. From BCG p. 127, work exercise #4. Jamie Monogan (UGA) Markov Chain Monte Carlo Spring / 40

Markov Chain Monte Carlo

Markov Chain Monte Carlo Jamie Monogan Washington University in St. Louis October 11, 2010 Jamie Monogan (WUStL) Markov Chain Monte Carlo October 11, 2010 1 / 59 Objectives By the end of this meeting,