STAT 6350 Analysis of Lifetime Data. Probability Plotting

Similar documents
Method of Moments. which we usually denote by X or sometimes by X n to emphasize that there are n observations.

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Final Exam. Name: Solution:

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

Probability Plots. Summary. Sample StatFolio: probplots.sgp

Estimation of Quantiles

Exam C Solutions Spring 2005

Parameter Estimation

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Distribution Fitting (Censored Data)

Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions

The Relationship Between Confidence Intervals for Failure Probabilities and Life Time Quantiles

Accelerated Destructive Degradation Test Planning

Semiparametric Regression

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution.

ST495: Survival Analysis: Hypothesis testing and confidence intervals

MAS3301 / MAS8311 Biostatistics Part II: Survival

Data: Singly right censored observations from a temperatureaccelerated

Modeling and Performance Analysis with Discrete-Event Simulation

ACCELERATED DESTRUCTIVE DEGRADATION TEST PLANNING. Presented by Luis A. Escobar Experimental Statistics LSU, Baton Rouge LA 70803

Subject CS1 Actuarial Statistics 1 Core Principles

Statistics for Engineers Lecture 4 Reliability and Lifetime Distributions

Introduction to Statistical Analysis

Chapter 9. Bootstrap Confidence Intervals. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Calculus I Exam 1 Review Fall 2016

Likelihood Construction, Inference for Parametric Survival Distributions

Sample Size and Number of Failure Requirements for Demonstration Tests with Log-Location-Scale Distributions and Type II Censoring

Chapter 15. System Reliability Concepts and Methods. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

SPRING 2007 EXAM C SOLUTIONS

APPENDIX 1 BASIC STATISTICS. Summarizing Data

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

MATH4427 Notebook 4 Fall Semester 2017/2018

Unit 10: Planning Life Tests

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

n =10,220 observations. Smaller samples analyzed here to illustrate sample size effect.

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Bayesian Life Test Planning for the Weibull Distribution with Given Shape Parameter

Solutions. Some of the problems that might be encountered in collecting data on check-in times are:

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Step-Stress Models and Associated Inference

ISQS 5349 Spring 2013 Final Exam

Survival Analysis Math 434 Fall 2011

5. Parametric Regression Model

Unit 20: Planning Accelerated Life Tests

10 Introduction to Reliability

STAT 461/561- Assignments, Year 2015

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

3 Modeling Process Quality

Calculus I Sample Exam #01

Bayesian Modeling of Accelerated Life Tests with Random Effects

Bayesian Life Test Planning for the Log-Location- Scale Family of Distributions

General Regression Model

Problem 1 (20) Log-normal. f(x) Cauchy

Degradation data analysis for samples under unequal operating conditions: a case study on train wheels

Practice Problems Section Problems

A Tool for Evaluating Time-Varying-Stress Accelerated Life Test Plans with Log-Location- Scale Distributions

ISI Web of Knowledge (Articles )

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Frequency Analysis & Probability Plots

Chapter Learning Objectives. Probability Distributions and Probability Density Functions. Continuous Random Variables

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Algebra I Assessment. Eligible Texas Essential Knowledge and Skills

Statistics: Learning models from data

Honors Algebra II Spring 2016 Final Exam Format

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

ST745: Survival Analysis: Parametric

Statistical Prediction Based on Censored Life Data. Luis A. Escobar Department of Experimental Statistics Louisiana State University.

Design and Implementation of CUSUM Exceedance Control Charts for Unknown Location

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Accelerated Destructive Degradation Tests: Data, Models, and Analysis

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Accelerated Failure Time Models

Stat 5101 Lecture Notes

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

Bootstrap Confidence Intervals

Matrices. A matrix is a method of writing a set of numbers using rows and columns. Cells in a matrix can be referenced in the form.

Empirical Likelihood

Institute of Actuaries of India

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

ST495: Survival Analysis: Maximum likelihood

Financial Econometrics and Volatility Models Extreme Value Theory

Hybrid Censoring Scheme: An Introduction

Parametric Evaluation of Lifetime Data

Regression and Nonlinear Axes

Topics Covered in Math 115

Using Accelerated Life Tests Results to Predict Product Field Reliability

VCE. VCE Maths Methods 1 and 2 Pocket Study Guide

STAT 830 Non-parametric Inference Basics

Professor Terje Haukaas University of British Columbia, Vancouver Size Effect Models. = 1 F(τ ) (1) = ( 1 F(τ )) n (2) p s

SIMULATION SEMINAR SERIES INPUT PROBABILITY DISTRIBUTIONS

ASTIN Colloquium 1-4 October 2012, Mexico City

Physics 509: Bootstrap and Robust Parameter Estimation

Linearization of Nonlinear Equations

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Prediction of remaining life of power transformers based on left truncated and right censored lifetime data

Transcription:

STAT 6350 Analysis of Lifetime Data Probability Plotting

Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life data. Probability plots are used to: Access the adequacy of a particular distributional model. To detect multiple failure modes or mixture of different populations. Obtain graphical estimates of model parameters (e.g. by fitting a straight line through the points on a probability plot). Displaying the results of a parametric maximum likelihood fit along with the data. Obtain, by drawing a smooth curve through the points, a semiparametric estimate of failure probabilities and distributional quantiles. 1

Probability Plot Scales: Linearizing a CDF Main Idea: For a given cdf, F (t), one can linearize the { t versus F (t) } plot by: Finding transformations of F (t) and t such that the relationship between the transformed variable is linear. The transformed axes can be relabeled in terms of the original probability and time variables. The resulting probability axis is generally nonlinear and is called the probability scale. The data axis is usually a linear axis or a log axis. 2

Linearizing the Exponential CDF CDF: p = F (t; θ, γ) = 1 exp [ (t γ) ] θ, t γ. Quantile: t p = γ θ log(1 p). Conclusion: The { t p versus log(1 p) } plot is a straight line. Plot t p on the horizontal axis (x-axis) and p on the vertical axis. γ is the intercept on the time axis and 1/θ is equal to the slope of the cdf line (quantile vs. time). Changing θ changes the slope of the line and changing γ changes the position of the line. 3

Linearizing the Normal CDF ( CDF: p = F (y; µ, σ) = Φ y µ nor σ < y <. ), Quantile: y p = µ + σφ 1 nor(p). where Φ 1 nor is the p quantile of the standard normal distribution. Conclusion: The { y p versus Φ 1 nor (p) } plot is a straight line. µ is the point at the time axis where the cdf intersects the Φ 1 (p) = 0 line (i.e., p = 0.5). The slope of the cdf line on the graph is 1/σ. Any normal cdf plots as a straight line with positive slope. Also, any straight line with positive slope corresponds to a normal distribution. 4

Linearizing the Lognormal CDF ( ) CDF: p = F (t; µ, σ) = Φ log(t) µ nor σ, t > 0. Quantile: t p = exp [ µ + σφ 1 nor (p)]. Then log(t p ) = µ + σφ 1 nor(p). Conclusion: The { log(t p ) versus Φ 1 nor (p) } plot is a straight line. exp(µ) can be read from the time axis at the point where the cdf intersects the Φ 1 (p) = 0 line. The slope of the cdf line on the graph is 1/σ. Any given lognormal cdf plots as a straight line with positive slope. Also, any straight line with positive slope corresponds to a lognormal distribution. 5

Linearizing the Weibull CDF CDF: [ log(t) µ p = F (t; η, β) = Φ sev σ = 1 exp ] ( ) β t η, t > 0. Quantile: t p = exp [ µ + Φ 1 sev (p)σ] = η[ log(1 p)] 1/β. This leads to log(t p ) = µ + σ log[ log(1 p)] = log(η) + log[ log(1 p)] 1 β. 6

Linearizing the Weibull CDF Conclusion: The { log(t p ) versus log[ log(1 p)] } plot is a straight line. η = exp(µ) can be read from the time axis at the point where the cdf intersects the log[ log(1 p)] = 0 line, which corresponds to p 0.632. The slope of the cdf line on the graph is β = 1/σ (but in the computations use base e logarithms for the times rather than the base 10 logarithms used for the figures). Any Weibull cdf plots as a straight line with positive slope. Also, any straight line with positive slope corresponds to a Weibull distribution. Exponential cdfs plot as straight lines with slopes equal to 1. 7

Probability Plotting Positions To construct a probability plot, one must decide how to plot the nonparametric estimate of F (t) on the probability scales. The discontinuity and randomness of ˆF (t) make it difficult to choose a definition for pairs of points (t, ˆF ) to plot. When times reported as exact, it has been traditional to plot { t i versus ˆF (t i ) } at the observed failure times. General idea: Plot an estimate of F at some specified set of points in time and define plotting positions consisting of a corresponding estimate of F at these points in time. 8

Criteria for Choosing Plotting Positions Criteria for choosing plotting positions should depend on the application or purposes for constructing the probability plot. Some applications that suggest criteria: Checking Distributional Assumptions Estimation of Parameters Display of Maximum Likelihood Fits with Data 9

Criteria for Choosing Plotting Positions Checking Distributional Assumptions Probability plotting is used to check if the observed data are well approximated by the postulated parametric distribution. Some bias in the slope and location of the fitted line is not a serious problem. It is generally suggested that the choice of plotting positions, in moderated-to-large samples, is not so important. 10

Criteria for Choosing Plotting Positions Estimation of Parameters Probability plotting is used to estimate parameters of a particular distribution based on a fitted line. The best plotting positions will depend on the assumed underlying model and the functions to be estimated. For complete data, letting i index the ordered observations, some general agreement on the plotting positions p i = i 0.5. n 11

Criteria for Choosing Plotting Positions Display of Maximum Likelihood Fits with Data Probability plotting is used to display the ML fit and to compare with the corresponding nonparametric estimate. With a poor choice of plotting positions, the ML line may not fit the plotted points and the probability plot can give the false impression that the parametric model and the data disagree. Plotting simultaneous confidence bands on the probability plot will indicate the amount of sampling variability one might expect to see. 12

Choice of Plotting Positions Three cases to consider: Continuous inspection (or small inspection intervals resulting in exact failures) Interval censored data with relative large intervals Arbitrarily censored data 13

Choice of Plotting Positions Continuous Inspection Data and Single Censoring (or complete data) ˆF (t) is a step function increasing by an amount 1/n at each reported failure time until the last reported failure time. A reasonable compromise plotting position is the midpoint of the jump i 0.5 n = 1 2 [ ˆF ( t (i) ) + ˆF ( t (i 1) )]. Plotting Positions: { t (i) versus i 0.5 } n. 14

Choice of Plotting Positions Continuous Inspection Data and Single Censoring (or complete data) Justification: The median of the i-th order statistic in a sample of size n is approximately F 1 [(i 0.5)/n]. Plotting at the bottom/top of the step would lead to bias in the plotted points and the ML line would tend to be above/below the plotted points. If the last reported time is a failure, it is not possible to plot a point at ˆF (t) = 1. 15

Choice of Plotting Positions Continuous Inspection Data and Multiple Censoring ˆF (t) is a step function until the last reported failure time but the step increases may be different than 1/n Plotting Positions: { t (i) versus p i } with p i = 1 2 { ˆF [ t (i) ] + ˆF [ t (i 1) ]}. Justification: This is consistent with the definition for single censoring. 16

Choice of Plotting Positions Interval-Censored Inspection Data Let the inspection times be (t 0, t 1 ], (t 1, t 2 ], (t m 1, t m ]. The upper endpoints of the inspection intervals t i, i = 1, 2,..., are convenient plotting times. Plotting Positions: { t (i) versus p i } with p i = ˆF [ t (i) ]. When there are no censored observations beyond t m, F (t m ) = 1 and this point cannot be plotted on the probability paper. Justification: With no losses, from standard binomial theory, E [ ˆF (t i ) ] = F (t i ). 17

Extensions of Probability Plots The probability plotting techniques can be extended to construct probability plots for: Distributions that are not members of the location-scale family. The help identify, graphically, the need for non-zero threshold parameter. Estimate graphically a shape parameter. 18

Distribution with a Threshold Parameter The lognormal, Weibull, gamma, and other similar distributions can be generalized by the addition of a threshold parameter, γ, to shift the beginning of the distribution away from 0. These distributions are particularly useful for fitting skewed distributions that are shifted far to the right of 0. For example, the cdf and quantiles of the three-parameter lognormal distribution can be expressed as p = F (t; µ, σ, γ) = Φ nor [ log(t γ) µ σ ], t > γ. 19

Dist. with Specified Shape Parameters Distributions with one or more unknown shape parameters, the plotting scales depend on the given value or estimate for the shape parameter. There are two approaches to specifying an unknown shape parameter for a probability plot: Plot the data with different given values of the shape parameter in an attempt to find a value that will give a probability plot that is nearly linear. Use parametric ML methods to estimate the shape parameter and use the estimate value to construct probability plotting scales. These two approaches generally lead to approximately the same plot. 20

Linearizing the 3-Parameter Gamma CDF CDF: p = F (t; θ, κ, γ) = Γ I ( t γ θ ; κ), t > γ. Quantile: t p = γ + θγ 1 I (p; κ) where Γ I (z; κ) = z 0 x κ 1 e x dx/γ(κ) and Γ(κ) = 0 x κ 1 e x dx. Conclusion: { t p versus Γ 1 I (p; κ) } plot is a straight line. The probability axis depends on specification of the shape parameter κ. γ is the intercept on the time axis (because Γ 1 I (p; κ) = 0 when p = 0). The slope of the cdf line is equal to 1/θ. Changing θ changes the slope of the line and changing γ changes the position of the line. 21

Linearizing the 3-Parameter Weibull CDF CDF: p = F (t; µ, σ, γ) = Φ sev [ log(t γ) µ t > γ. σ ], Quantile: t p = γ + η [ log(1 p)] 1/β where Φ sev (z) = 1 exp[ exp(z)], η = exp(µ), β = 1/σ. Conclusion: The { t p versus [ log(1 p)] 1/β line. } plot is a straight The probability axis for this linear-time-axis Weibull probability plot requires specification of the shape parameter β. γ is the intercept on the time axis. The slope of the cdf line is equal to 1/η. The plot allows graphical estimation the threshold parameter γ. 22

Linearizing the Generalized Gamma CDF CDF: p = F (t; θ, β, κ) = Γ I [ ( tθ ) β ; κ ]. Quantile: t p = θ [ Γ 1 I (p; κ) ] 1/β Then log(t p ) = log(θ) + 1 β log [ Γ 1 I (p; κ) ]. Conclusion: The { t p versus log [ Γ 1 I (p; κ) ] } plot is a straight line. The scale parameter θ is the intercept on the time scale, corresponding to the time where the cdf crosses the horizontal line at log [ γ 1 I (p; κ) ] = 0. The slope of the line on the graph with time on the horizontal axis is β. The probability scale for the GENG probability plot requires a given value of the shape parameter κ. 23

Application of Probability Plotting Using simulation to help interpret probability plots Try different assumed distributions and compare the results. Assess linearity, allowing for more variability in the tails. Use simultaneous nonparametric confidence bands. Use simulation or bootstrap to calibrate. Possible reasons for a bend in a probability plot Shape bend or change in slope generally indicate an abrupt change in a failure process. Causes for such behavior could include two or more failure modes or a mixture of different subpopulations. 24