eqr094: Hierarchical MCMC for Bayesian System Reliability

Similar documents
Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics

Markov Chain Monte Carlo methods

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Markov Chain Monte Carlo methods

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Markov chain Monte Carlo

Markov Chain Monte Carlo in Practice

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Markov Chain Monte Carlo Methods

Principles of Bayesian Inference

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo

Kobe University Repository : Kernel

Markov Chain Monte Carlo (MCMC)

Bayesian Inference. Chapter 1. Introduction and basic concepts

What is the most likely year in which the change occurred? Did the rate of disasters increase or decrease after the change-point?

Precision Engineering

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

10. Exchangeability and hierarchical models Objective. Recommended reading

ST 740: Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Control Variates for Markov Chain Monte Carlo

MARKOV CHAIN MONTE CARLO

MCMC algorithms for fitting Bayesian models

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Introduction to Machine Learning CMU-10701

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Methods for Estimating the Reliability of Complex Systems Using Heterogeneous Multilevel Information

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1

Principles of Bayesian Inference

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Tools for Parameter Estimation and Propagation of Uncertainty

36-463/663: Hierarchical Linear Models

Lecture 8: The Metropolis-Hastings Algorithm

16 : Approximate Inference: Markov Chain Monte Carlo

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Monte Carlo in Bayesian Statistics

MCMC Methods: Gibbs and Metropolis

Bayesian Inference for the Multivariate Normal

Web Appendix for The Dynamics of Reciprocity, Accountability, and Credibility

1 MCMC Methodology. ISyE8843A, Brani Vidakovic Handout Theoretical Background and Notation

Metropolis-Hastings Algorithm

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Sampling Methods (11/30/04)

INTRODUCTION TO BAYESIAN STATISTICS

Computer intensive statistical methods

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Markov chain Monte Carlo

Bayesian data analysis in practice: Three simple examples

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Answers and expectations

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian computational methods for estimation of two-parameters Weibull distribution in presence of right-censored data

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Molecular Epidemiology Workshop: Bayesian Data Analysis

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

An ABC interpretation of the multiple auxiliary variable method

Principles of Bayesian Inference

Surveying the Characteristics of Population Monte Carlo

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Markov Chain Monte Carlo

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Bayesian Phylogenetics:

Tables of Probability Distributions

Computational statistics

Advances and Applications in Perfect Sampling

Subjective and Objective Bayesian Statistics

MONTE CARLO METHODS. Hedibert Freitas Lopes

ELEC633: Graphical Models

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

Markov Chain Monte Carlo and Applied Bayesian Statistics

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond

Advanced Statistical Modelling

Statistical Inference for Stochastic Epidemic Models

MCMC: Markov Chain Monte Carlo

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

STA 4273H: Statistical Machine Learning

Bayesian Phylogenetics

STAT 425: Introduction to Bayesian Analysis

\ fwf The Institute for Integrating Statistics in Decision Sciences

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

17 : Markov Chain Monte Carlo

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Adaptive Monte Carlo methods

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the

Theory and Methods of Statistical Inference

Computer Practical: Metropolis-Hastings-based MCMC

CSC 2541: Bayesian Methods for Machine Learning

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Transcription:

eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167 Fax: 505-667-4470 E-mail: agw@lanl.gov June 27, 2006 Abstract Hierarchical models are one of the central tools of Bayesian analysis. They offer many advantages, including the ability to borrow strength to estimate individual parameters and the ability to specify complex models that reflect engineering and physical realities. Markov chain Monte Carlo is a set of algorithms that allow Bayesian inference in a variety of models. We illustrate hierarchical models and Markov chain Monte Carlo in a 1

Bayesian system reliability example. Keywords: hierarchical model, system reliability, Markov chain Monte Carlo 1 HIERARCHICAL MODELS Hierarchical models are one of the central tools of Bayesian analysis. Broadly, Bayesian models have two parts (see eqr081): the likelihood function and the prior distribution. We construct the likelihood function from the sampling distribution of the data, which describes the probability of observing the data before the experiment is performed. After we perform the experiment and observe the data, we can consider the sampling distribution as a function of the unknown parameters. This function is called the likelihood function. The prior distribution describes the uncertainty about the parameters of the likelihood function. We update the prior distribution to the posterior distribution after observing data. We use Bayes Theorem to perform the update, which shows that the posterior distribution is computed (up to a proportionality constant) by multiplying the likelihood function by the prior distribution. Denote the sampling distribution as f(y θ) and the prior distribution as g(θ α), where α represents the parameters of the prior distribution (often called hyperparameters). It may be the case that we know g(θ α) completely, including a specific value for α. However, suppose that we do not know α, and that we choose to quantify our uncertainty about α using a distribution h(α) (often called the hyperprior). This is the most general form of a hierarchical model. 2

A hierarchical model has three parts [1]: 1. The observational model for the data. (Y i θ) f(y i θ), i = 1,..., k. 2. The structural model for the parameters of the likelihood. (θ α) g(θ α). 3. The hyperparameter model for the parameters of the structural model. α h(α). While this is the most general form of the hierarchial model, one of its most common applications is in problems where there are multiple parameters that are related by the structure of the problem. Consider the data in Table 1, which are analyzed in [2] and [3]. These data summarize the number of failures of pumps (s i ) over a period of time (t i ) in several systems (i) of the nuclear power plant Farley 1. What is the appropriate observational model for the data in Table 1? We could assume that each pump has an identical failure rate, λ. We could then model the number of failures as S i Poisson(λt i ), i = 1,..., 10 or equivalently, 10 i=1 S i Poisson(λ 10 i=1 t i). 3

Table 1: Pump Failures (t i in thousands of hours) System i s i t i 1 5 94.320 2 1 15.720 3 5 62.880 4 14 125.760 5 3 5.240 6 19 31.440 7 1 1.048 8 1 1.048 9 4 2.096 10 22 10.480 The assumption of a common failure rate, however, is quite strong, and may be inappropriate. Suppose that instead of a common failure rate, we consider the failure rate for each pump, λ i, as a sample from an overall population distribution. We have no information, other than the data, that distinguishes the failure rates, and we have no ordering or grouping of the rates. Formally, this suggests that we treat the rates as exchangeable, which implies that their joint distribution g(λ 1,..., λ 10 ) is invariant to permutations of the indices of the parameters. By treating the rates as exchangeable, we propose the following hierarchial model: 1. An observational model for the data. (S i λ i ) Poisson(λ i t i ), i = 1,..., 10. 4

2. A structural model for the parameters of the likelihood. (λ i α, β) Gamma(α, β), i = 1,..., 10. 3. A hyperparameter model for the parameters of the structural model. α Uniform(0.0, 5.0) β Gamma(0.1, 1.0). This model allows the pump for each system to have its own failure rate, but it also models each failure rate as coming from a common population distribution. This hierarchical structure allows us to borrow strength for the estimation of each λ i. In particular, the estimation of each λ i is improved by using the failure data from the other pumps. 2 MARKOV CHAIN MONTE CARLO Given the hierarchial model for the data in Table 1, we now need to estimate the posterior distributions for λ i, i = 1,..., 10, α, and β. The posterior distribution is proportional to p(λ 1,..., λ 10, α, β s, t) exp( β 10 i=1 (β + t i )λ i )( 10 i=1 λ s i+α 1 i ) β 10α+0.1 1 (Γ(α)) 10 I(α (0, 5)). 5

(For more details on the derivation, see [3].) In Bayesian analysis, the posterior distribution provides the answer; however, it is not immediately obvious how to use this expression for p(λ 1,..., λ 10, α, β s, t) to answer questions of interest. In particular, suppose that we would like to know the expected rate of failure for an arbitrary pump from this population given the observed data. Mathematically, we need to calculate α β p(λ 1,..., λ 10, α, β s, t)dλ 1... dλ 10 dαdβ. (1) Monte Carlo integration can approximate this integral: we draw a sample X (j) = (λ (j) 1,..., λ(j) 10, α(j), β (j) ), j = 1,..., n from p(λ 1,..., λ 10, α, β s, t) and then calculate n j=1 α(j) /β (j). Markov chain Monte Carlo (MCMC) provides the algorithms to generate the random sample from the posterior distribution or more generally, from any distribution, π( ), of interest. The strategy is to construct a Markov chain with π( ) as its stationary distribution. A Markov chain is a sequence of random variables X (j), j = 0,..., so that for each j, X (j) is sampled from a distribution P (X (j) X (j 1) ). Given X (j 1), the next state X (j) depends only on X (j 1) and not on the rest of the history of the chain. The sequence is called a Markov chain, and P ( ) is called the transition kernel of the chain. Subject to certain regularity conditions [4], the chain, regardless of its starting value X (0), eventually converges (after a burn-in of m samples) to a unique stationary distribution. As j increases, the samples X (j) become dependent samples approximately from π( ). 6

Constructing a Markov chain with the appropriate stationary distribution is straightforward using the Metropolis-Hastings algorithm [5]. Suppose that we want to draw X (j+1). We sample a candidate point Y from a proposal distribution q( X (j) ). The proposal distribution may depend on the current point X (j). The candidate point is accepted (i.e., X (j+1) = Y) with probability α(x (j), Y), where ( ) π(y)q(x Y) α(x, Y) = min 1,. (2) π(x)q(y X) Otherwise, X (j+1) = X (j). The rate of convergence depends strongly on the relationship of the stationary distribution and the proposal distribution. See [4] for strategies on choosing good proposal distributions. There are canonical choices for proposal distributions. The Metropolis algorithm uses symmetric proposals, where q(y X) = q(x Y). A special case of the Metropolis algorithm is random-walk Metropolis, where q(y X) = q( X Y ). The independence sampler [6] has a proposal density that does not depend on the current state of the chain, q(y X) = q(y). Independence samplers work well when q( ) is a good approximation to π( ) but with heavier tails. Two of the most widely used choices for proposal distributions are the singlecomponent Metropolis-Hastings algorithm and the Gibbs sampler. Instead of updating the entire vector X in one step, these algorithms update smaller groups of the components of X often only one component at a time. Let X = (X 1,..., X p ) and let X i = (X 1,..., X i 1, X i+1,..., X p ). Define the 7

full conditional distribution of X i as the distribution of the ith component of X conditional on all of the remaining components, X i. If X π( ), the full conditional distribution of X i is π(x i X i ) = π(x) π(x)dxi. The single-component Metropolis-Hastings algorithm generates candidate points Y i from proposal densities q i (Y i, X (j) i X (j 1), X (j) i ), where X(j) i = (X(j) 1,..., X(j) i 1, i+1,..., X p (j 1) ), so that components with indices smaller than i have already been updated j times. The acceptance probability for Y i is ( α(x i, X i, Y i ) = min 1, π(y ) i X i )q i (X i Y i, X i ). π(x i X i )q i (Y i X i, X i ) The Gibbs sampler [7] is a special case of the single-component Metropolis- Hastings algorithm where the proposal distribution is the full conditional distribution, q i (Y i, X (j) i, X (j) i ) = π(y i X i ). The acceptance probabilities for the Gibbs sampler are all one. 8

3 APPLICATION TO BAYESIAN SYSTEM RE- LIABILITY For illustration, suppose that we have a simple parallel system (see eqr341) that comprises the 10 pumps described in Table 1. The system works if at least one of the pumps is working. We would like to estimate the probability that the system is working at 10,000 hours. (Remember that the times in Table 1 are in thousands of hours.) Because we describe the number of failures as a Poisson process (see eqr055), Poisson(λ i t i ), the time to the first failure of a pump is distributed Exponential(λ i ). Assume for this example that once a pump fails, it is not repaired. (Of course, in a real nuclear power plant, pumps are always repaired.) We would like to estimate the probability, R S (t), that the system is working at 10,000 hours. The probability that the system is working at 10,000 hours is R S (10) = 1 10 i=1 (1 exp( 10λ i )). (See [8] for the derivation of the expression for system reliability.) We use a combination of the Gibbs sampler and single-component Metropolis- Hastings algorithm to estimate the failure rates for each pump. This requires the full conditional distributions for each parameter. [λ i λ i, α, β] Gamma(s i + α, β + t i ) 10 [β α, λ 1,..., λ 10 ] Gamma(10α + 0.1, 1.0 + λ i ) 10 [α β, λ 1,..., λ 10 ] ( λ s i+α 1 i )β 10α 0.9 (Γ(α)) 10 I(α (0, 5)) i=1 i=1 9

For the (j + 1)st iteration of MCMC, for i = 1,..., 10, we set λ (j+1) i equal to a random draw from a Gamma(s i + α (j), β (j) + t i ) distribution, and we set β (j+1) equal to a random draw from a Gamma(10α (j) + 0.1, 1.0 + 10 i=1 λ(j+1) i ). To sample α (j+1), we set a candidate point Y equal to a random draw from a proposal distribution: for example, Normal(α (j), 0.25)I(0 < α < 5). As a general heuristic, we choose the standard deviation of the proposal distribution so that the candidate acceptance probability is between 0.25 and 0.45 [9]. For this proposal distribution, the candidate acceptance probability is ( min 1, 10 i=1 (λ(j+1) i 10 i=1 (λ(j+1) i ) s i+y 1 (β (j+1) ) 10Y 0.9 Γ(Y ) 10 ) si+α(j) 1 (β (j+1) ) 10α(j) 0.9 Γ(α (j) ) 10 ). We could also use a MCMC package like WinBUGS [10] to perform the calculations. Once we have a sample from the posterior distribution, we evaluate R S (10) for each draw. The posterior mean of R S (10) is 0.92, with a 95% credible interval of (0.77, 0.99). Figure 1 is a plot of the posterior distribution of the system reliability at 10,000 hours. (For more information on creating summaries using MCMC draws, see [11].) 4 SUMMARY Hierarchical models offer many advantages including the ability to borrow strength to estimate individual parameters and the ability to specify complex models that reflect engineering and physical realities. MCMC provides a general set of 10

0 2 4 6 0.6 0.7 0.8 0.9 1.0 System Reliability at 10,000 Hours Figure 1: The posterior distribution of R S (10), the system reliability at 10,000 hours. 11

algorithms that enable Bayesian inference in a variety of complex models. 5 RELATED ARTICLES eqr055, eqr081, eqr086, eqr110, eqr341, eqr353 References [1] Draper D, Gaver D, Goel P, Greenhouse J, Hedges L, Morris C, Tucker J, Waternaux C. Combining Information: Statistical Issues and Opportunities for Research; National Academies Press: Washington, DC, 1992 [2] Gaver D, O Muircheartaigh I. Empirical Bayes analyses of event rates. Technometrics 1987 29: 1-15. [3] Gelfand A, Smith A. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 1990 85: 398-409. [4] Gilks W, Richardson S, Spiegelhalter, D. Markov Chain Monte Carlo in Practice; Chapman & Hall: London, 1996 [5] Chib S, Greenberg E. Understanding the Metropolis-Hastings algorithm. The American Statistician 1995 49: 327-335. [6] Tierney L. Markov chains for exploring posterior distributions (with discussion). Annals of Statistics 1994 22: 1701-1762. 12

[7] Casella G, George E. Explaining the Gibbs sampler. The American Statistician 1992 46: 167-174. [8] Rausand M, Høyland A. System Reliability Theory: Models, Statistical Methods, and Applications, 2nd Edition; John Wiley & Sons: Hoboken, NJ, 2003 [9] Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis; Chapman & Hall: Boca Raton, FL, 1995 [10] The BUGS Project WinBUGS. MRC Biostatistics Unit, Cambridge, UK. 26 June 2006. http://www.mrcbsu.cam.ac.uk/bugs/winbugs/contents.shtml. [11] Smith A, Roberts C. Bayesian computations via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B 1993 55: 3-23. 13