Monitoring aggregated Poisson data with probability control limits

Similar documents
A Theoretically Appropriate Poisson Process Monitor

An exponentially weighted moving average scheme with variable sampling intervals for monitoring linear profiles

CUMULATIVE SUM CHARTS FOR HIGH YIELD PROCESSES

An Adaptive Exponentially Weighted Moving Average Control Chart for Monitoring Process Variances

The Robustness of the Multivariate EWMA Control Chart

Statistical Process Control for Multivariate Categorical Processes

Likelihood-Based EWMA Charts for Monitoring Poisson Count Data with Time-Varying Sample Sizes

Weighted Likelihood Ratio Chart for Statistical Monitoring of Queueing Systems

Optimum Hybrid Censoring Scheme using Cost Function Approach

CONTROL CHARTS FOR THE GENERALIZED POISSON PROCESS WITH UNDER-DISPERSION

Univariate and Multivariate Surveillance Methods for Detecting Increases in Incidence Rates

The Non-Central Chi-Square Chart with Double Sampling

A Modified Poisson Exponentially Weighted Moving Average Chart Based on Improved Square Root Transformation

Control charts are used for monitoring the performance of a quality characteristic. They assist process

A new R package for TAR modeling

THE DETECTION OF SHIFTS IN AUTOCORRELATED PROCESSES WITH MR AND EWMA CHARTS

arxiv: v1 [stat.me] 14 Jan 2019

Exponentially Weighted Moving Average Control Charts for Monitoring Increases in Poisson Rate

An Investigation of Combinations of Multivariate Shewhart and MEWMA Control Charts for Monitoring the Mean Vector and Covariance Matrix

Performance of Conventional X-bar Chart for Autocorrelated Data Using Smaller Sample Sizes

Robust economic-statistical design of multivariate exponentially weighted moving average control chart under uncertainty with interval data

Self-Starting Control Chart for Simultaneously Monitoring Process Mean and Variance

Likelihood Ratio-Based Distribution-Free EWMA Control Charts

MONITORING BIVARIATE PROCESSES WITH A SYNTHETIC CONTROL CHART BASED ON SAMPLE RANGES

Surveillance of Infectious Disease Data using Cumulative Sum Methods

Section II: Assessing Chart Performance. (Jim Benneyan)

Module B1: Multivariate Process Control

Zero-Inflated Models in Statistical Process Control

First Semester Dr. Abed Schokry SQC Chapter 9: Cumulative Sum and Exponential Weighted Moving Average Control Charts

The occurrence of rare events in manufacturing processes, e.g. nonconforming items or machine failures, is frequently modeled

Change Point Estimation of the Process Fraction Non-conforming with a Linear Trend in Statistical Process Control

Distribution-Free Monitoring of Univariate Processes. Peihua Qiu 1 and Zhonghua Li 1,2. Abstract

An Economic Alternative to the c Chart

assumption identically change method. 1. Introduction1 .iust.ac.ir/ ABSTRACT identifying KEYWORDS estimation, Correlation,

DIAGNOSIS OF BIVARIATE PROCESS VARIATION USING AN INTEGRATED MSPC-ANN SCHEME

PRIME GENERATING LUCAS SEQUENCES

Quality Control & Statistical Process Control (SPC)

Two widely used approaches for monitoring and improving the quality of the output of a process are statistical process control

Monitoring Censored Lifetime Data with a Weighted-Likelihood Scheme

A STUDY ON IMPROVING THE PERFORMANCE OF CONTROL CHARTS UNDER NON-NORMAL DISTRIBUTIONS SUN TINGTING

Directionally Sensitive Multivariate Statistical Process Control Methods

Bayesian multiple change-point estimation of Poisson rates in control charts

Improving Biosurveillance System Performance. Ronald D. Fricker, Jr. Virginia Tech July 7, 2015

Normal Probability Plot Probability Probability

CONTROL charts are widely used in production processes

Multivariate Charts for Multivariate. Poisson-Distributed Data. Busaba Laungrungrong

Monitoring of serially correlated processes using residual control charts

Economic design of cumulative count of conforming control charts based on average number of inspected items

On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator

Efficient Control Chart Calibration by Simulated Stochastic Approximation

Application and Use of Multivariate Control Charts In a BTA Deep Hole Drilling Process

Control charts continue to play a transformative role in all walks of life in the 21st century. The mean and the variance of a

Nonparametric Monitoring of Multiple Count Data

A problem faced in the context of control charts generally is the measurement error variability. This problem is the result of the inability to

Uniform random numbers generators

Linear Regression and Its Applications

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Monitoring General Linear Profiles Using Multivariate EWMA schemes

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Run sum control charts for the monitoring of process variability

THE CUSUM MEDIAN CHART FOR KNOWN AND ESTIMATED PARAMETERS

Construction of An Efficient Multivariate Dynamic Screening System. Jun Li a and Peihua Qiu b. Abstract

Time Control Chart Some IFR Models

Control Charts for Monitoring the Zero-Inflated Generalized Poisson Processes

On ARL-unbiased c-charts for i.i.d. and INAR(1) Poisson counts

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

A New Demerit Control Chart for Monitoring the Quality of Multivariate Poisson Processes. By Jeh-Nan Pan Chung-I Li Min-Hung Huang

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

A New Model-Free CuSum Procedure for Autocorrelated Processes

Performance Analysis of Queue Length Monitoring of M/G/1 Systems

Optimal Discrete Search with Imperfect Specicity

Received 17 November 2004; Revised 4 April 2005

Modern Statistical Process Control Charts and Their Applications in Analyzing Big Data

Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges

Computational and Monte-Carlo Aspects of Systems for Monitoring Reliability Data. Emmanuel Yashchin IBM Research, Yorktown Heights, NY

Monitoring Multivariate Data via KNN Learning

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract

A NONLINEAR FILTER CONTROL CHART FOR DETECTING DYNAMIC CHANGES

Faculty of Science and Technology MASTER S THESIS

covariance nor the comparison of these charts have been well studied. We will study these in this paper. The charts for covariance surveyed in Alt and

Detection and Diagnosis of Unknown Abrupt Changes Using CUSUM Multi-Chart Schemes

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Effect of sample size on the performance of Shewhart control charts

ARL-unbiased geometric control charts for high-yield processes

A BINARY CONTROL CHART TO DETECT SMALL JUMPS

On the performance of Shewhart-type synthetic and runs-rules charts combined with an chart

Robust control charts for time series data

Optimal Design of Second-Order Linear Filters for Control Charting

Nonparametric Multivariate Control Charts Based on. A Linkage Ranking Algorithm

A Power Analysis of Variable Deletion Within the MEWMA Control Chart Statistic

Univariate Dynamic Screening System: An Approach For Identifying Individuals With Irregular Longitudinal Behavior. Abstract

Performance of Low Density Parity Check Codes. as a Function of Actual and Assumed Noise Levels. David J.C. MacKay & Christopher P.

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

An exact approach to early/tardy scheduling with release dates

Regenerative Likelihood Ratio control schemes

Monitoring Paired Binary Surgical Outcomes Using Cumulative Sum Charts

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

A new bias correction technique for Weibull parametric estimation

Directional Control Schemes for Multivariate Categorical Processes

Transcription:

XXV Simposio Internacional de Estadística 2015 Armenia, Colombia, 5, 6, 7 y 8 de Agosto de 2015 Monitoring aggregated Poisson data with probability control limits Victor Hugo Morales Ospina 1,a, José Alberto Vargas Navas 2,b 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias, Universidad de Córdoba, Montería, Colombia 2 Departamento de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia Resumen The aggregation of data is a common practice in many applications, specially those with large numbers of events, so determining the impact of their use, and dispose of schemes that allow an adequate monitoring of such situations, is very important. If we assume that certain rate of events of interest have a Poisson distribution, then aggregating these events for an aggregation period of length d time units, it is obtained again data with Poisson distribution, so the schemes traditionally used for monitoring processes with such data, could also be used when these are aggregated. These schemes generally require an adequate knowledge of the samples sizes with which we are going to work, which in cases such as healthcare surveillance, it is not possible to have, since sample size in this case is related to the population at risk, which frequently changes over time. If we aggregate data when the sample size varies over time, and the count of events or non-conformities follows an (conditional) independent Poisson distribution given the corresponding sample size, the result is a process with time-varying sample sizes and with (conditional) Poisson distribution. For situations that involve sample sizes time varying, proposals such as Zhou et al. (2012), and Ryan & Woodall (2010), allow adequate monitoring of the processes, however, as Shen et al. (2013) pointed out, these and other works were built on the ground that the sample size is assumed to follow a pre-specied random or deterministic model, which is known a priori when establishing appropriate control limits before the control charts initiate. As Zhou et al. (2012) pointed out, traditional control charts are very sensitive to the correct specication of sample sizes. Furthermore, as Shen et al. (2013) pointed out, an inappropriate assumption of the distribution function may lead to unexpected performance of control charts, e.g., excessive false alarms at early runs of the control charts which in turn hurt an operator's condence in valid alarms. To overcome this drawback, Shen et al. (2013) proposes the use of probability control limits in an EWMA control chart for monitoring Poisson count data with time-varying sample sizes in Phase II. In this work, no matter what the (unknown) time-varying sample sizes are, the proposed EWMA chart always shares identical run length distribution with the Geometric distribution. Essentially this chart uses dynamic control limits which are determined online and depend only on the current and past sample size observations. It does not need to specify any sample size models before implementation, except the desired false alarm rate. Given the importance of determining the eect of aggregation of data in monitoring schemes, and considering the large number of applications that require or use time-varying sample sizes with little knowledge about their distribution, we propose to study the eect of aggregating data in monitoring schemes which employ probabilistic limits of control. Palabras clave: Aggregation of data, Poisson distribution, probability control limits, time-varying sample sizes. a Profesor Asistente. E-mail: vmorales@sinu.unicordoba.edu.co b Profesor Titular. E-mail: javargasn@unal.edu.co 1

2 Victor Hugo Morales Ospina & José Alberto Vargas Navas 1. Aggregated data Poisson In many applications, it is frequent to use methodologies and practices that facilitate the analysis of certain information, allowing to correct, or at least attenuate the diculties that can generate some characteristics in the data, such as the zero-excess presence. One of these procedures, is the data aggregation. The eect of this practice was studied by Reynolds & Stoumbos (2000). They considered two CUSUM chart types for monitoring changes in the proportion of deective items. The rst chart was based on binomial variables that result from counting the total number of eective items in a sample of size n. The second chart was based on Bernoulli observations which corresponds to the individual items checked in the samples. As result of this proceeding, it was concluded that the CUSUM Bernoulli chart showed a better global performance, specially for detecting big increases in the no conforming rates items. Reynolds & Stoumbus (2004a) researched about of convenience or not to group observations, if it is desirable to eectively monitor the process. In this approach, they investigated whether it is better to use n = 1 or n > 1 As a result they found that, the use of CUSUM charts combinations for the mean and the variance, in general, produce best statistic performance in detecting small and big shifts, sustained or transitories in µ or in σ, when the sample consist of one observation. Besides, if the Shewhart chart is used,n = 1 is better when it is required to detect small sustained shifts. Reynolds & Stoumbos (2004b) investigate whether to use n = 1 is better than n > 1 from the perspective of statistical performance in monitoring the mean and the variance process. In order to get to this, the performance of Shewhart, EWMA and CUSUM charts are compared. The result investigation shows that it is not reasonable to use the Shewhart control chart when there are indivdual observations, and the EWMA and CUSUM charts have a better statistic performance for a wide range of sample sizes and out-of-control like drift processes. In the same way, signicant dierences in the statistic performance of the EWMA and CUSUM charts, were not found. With these charts, by using n = 1 there is a better statistic performance than n > 1. One of the biggest areas of application of data aggregation is related to all those situations in which a Poisson type variable is generated, such as in the public health surveillance. In this area, there are some known researchers that deal with data aggregation. For example, Burkom et al. (2004) explore the data aggregating by space, time and categories of data, and discuss the data aggregation relevances in the ecacy of alert algorithms relevant to public health surveillance, among others. The authors concluded that a judicious strategy of data aggregation has an important function within the improvement of biomonitoring systems. Gan (1994) compares the performance of two CUSUM charts. In one chart he considered the time between Poisson events, which has exponential distribution, and in the other chart he supposes aggregate counts, which have Poisson distribution. Schuh et al. (2013), extend Gan's (1994) investigation by exploring to a greater extent the relative performance of the exponential CUSUM control charts and Poisson CUSUM considering some longer periods of aggregation. 2. The EWMAG chart As indicated earlier, Shen et al.(2013) proposed the EWMAG chart, which use probability control limits in a EWMA control chart for monitoring Poisson count data with time-varying sample sizes in Phase II. The proposed EWMA chart is called EWMAG chart because it is in control (IC), run length distribution is theoretically identical to the Geometric distribution, i.e., the false alarm rate does not depend on the time of the monitoring, nor does the sample size being monitored. Let X t be the count of an adverse event during the xed time period (t 1, t] (count of events at time t). Suppose X t independentely follows the Poisson distribution with the mean θn t conditional on n t,where θ and n t denote the occurrence rate of the event and sample size at time t, respectively. It is desired to detect an abrupt change in the occurrence rate from θ 0 to another unknown value θ 1 > θ 0. The EWMAG

Monitoring aggregated Poisson data with probability control limits 3 chart uses the statistics Z i = (1 λ)z i 1 + λ X i n i, i = 1, 2,..., t (1) as the charting statistic, where Z 0 = θ 0 and λ (0, 1] is a smoothing parameter which determines the weights of past observations. The control limit of the EWMAG chart must satisfy the following equations P (Z 1 > h 1 (α) n 1 ) = α P (Z t > h t (α) Z i < h i (α), 1 i < t, n t ) = α for t > 1 (2) where h t (α) is the control limit at time t, and α is the pre-specied false alarm rate. At time t, the probability control limit is determined right after we observed the value of n t. Consequently, the EWMAG chart does not need the assumption of future sample sizes and does not suer from wrong model assumptions. This property makes the proposed EWMAG chart signicantly dierent from previous control charts. Because of the intricacy of the conditional probability 2, it is impossible to solve h t (α) analytically. This way, computational procedures are necessary. The procedures in order to nd the probability control limits is summarized in the following algorithm: 1. If there is no a out-of-control (OC) signal at time t 1 (t = 1, 2,...), Xt,i (i = 1,..., M) are generated from the distribution Poisson(θ 0 n t ) where n t is exactly known. Accordingly, M values of the pseudo charting statistic Z t are obtained through Ẑ t,i = (1 λ)ẑt 1,j + λ X t,i n t (3) where i = 1,..., M and Ẑt 1,j, j 1,..., M is uniformly selected from Ẑ t 1,M, which contains the rts M ranked pseudo Z t 1. 2. Sort them in ascending order and the α upper empirical quantile of those M values are used for approximating the control limit h t (α). 3. Compare the value of Ẑ t, which is calculated based on observed X t and n t, with h t (α) to decide whether to issue an OC signal or to continue toward the next time point. 4. If there is a continuity, set M = [M(1 α)] and eliminate the values Ẑt,(M +1),...,Ẑt,(M). Then go back to step 1. 3. The EWMAG chart for aggregated Poisson data Our proposal is to determine the eect of aggregating data when a Poisson process is monitored with the sizes of samples variable over time. For this purpose, we adapt the proposal of Shen et al. (2013) to this context. Therefore, simulation procedures are considered to nd control limits of EWMAG chart with aggregated observation over time. The eect of aggregation of data is determined taking into account the sensiblity of EWMAG chart to detect out-of-control processes after considering dierent periods of aggregation, with the sensibility of this chart as proposed by Shen et al. (2013). The comparison criteria will be the ARL. The simulations will be made considering t sampling points corresponding to an equal number of samples. The observations obtained will be monitored with the EWMAG chart, and also having into account dierent aggregations periods according with our proposal. A more detailed description of this process, is shown below.

4 Victor Hugo Morales Ospina & José Alberto Vargas Navas If in times t 1, t 2,..., t i,..., t n,... are observed Poisson data X 1, X 2,..., X i,..., X n,... with means θ 0 n ti conditional on n ti respectively (X i P (θ 0 n ti n ti )), where θ 0 is the occurrence rate of the event and n ti is the sample size at time t i, then in the time period [t i, t j ], i, j Z + it will be obtained the observation Y ij = j k=i X k, which have Poisson distribution with mean j θ 0 k=i n t k conditional on j k=i n t k. Thus, according to Shen et al. (2013), it is possible to implement the EWMAG chart for the variable Y. Let Y 1r the variable Y observed for the time period [t 1, t r ]. Under the IC condition, Y 1r should follow the Poisson distribution with mean r θ 0 conditional on r, where r is exactly known. Therefore we can obtain the control limits during the rst time period of aggregation, by randomly generating Ŷ1r,i, where i = 1,..., M, from the distribution r Poisson(θ 0 ) and correspondingly calculating M values of pseudo Z 1r from 1 with Z 0 = θ 0, say Ẑ1r,1,...Ẑ1r,M. Then those values have to be sorted in ascending order and stored in a vector Ẑ 1rM. Then control limit h 1r (α) can be approximated as the M = M(1 α) largest values in Ẑ 1rM, where M(1 α) denotes the largest integer less than or equal to M(1 α). After determining the control limit h 1r (α), we compare the value of Ẑ1r, which is calculated based on the observed Y 1r and r, with h 1r (α). An out-of-control(oc) signal is issued if Ẑ1r > h 1r (α). Otherwise, it is possible to move forward to the next time period, [t r+1, t s ], s > r + 1. According to 2, in order to determine the control limit h r+1,s (α) corresponding to time period [t r+1, t s ], we should ensure that the value of pseudo Z 1r be less than or equal to h 1r (α). Hence only the ranked values Ẑ1r,(1),...Ẑ1r,(M ) should be kept to determine h r+1,s (α). We store the M ranked pseudo Z 1r into a vector Ẑ 1rM. Let Let Y r+1,s the variable Y observed for the time period of aggregation [t r+1, t s ]. Given s k=r+1 n t k, a vector Ẑ r+1,sm with dimension M can be obtained through Ŷ r+1,s,i Ẑ r+1,s,i = (1 λ)ẑ1,r,,j + λ s k=r+1 n (4) t k where i = 1,..., M, Ẑ1,r,,j is uniformly selected from Ẑ 1rM with j {1,..., M }, and Ŷr+1,s,i are randomly generated from s Poisson(θ 0 k=r+1 n t k ). After sorting the M elements of Ẑ r+1,s,m in ascending order, we obtain the control limit h r+1,s (α) by setting it at the (1 α) quantile of the M elements. An out-of-control(oc) signal is issued if Ẑ r+1,s > h r+1,s (α). Otherwise, it is possible to move forward to the next time period, [t s+1, t u ], s > r + 1. Repeat the above procedures by simulating M samples of u Poisson(θ 0 k=s+1 n t k ),... etc. Finally, for dierent out-of-control states, the variables X and Y will be monitored with the proposal of Shen et al. (2013) and compared taking into account the ARL in each case.

Monitoring aggregated Poisson data with probability control limits 5 4. Bibliography Burkom, H., Elbert, Y., Feldman, A., Lin, J. (2004), Role of Data Aggregation in Biosurveillance Detection Strategies with Applications from ESSENCE, 53(Supplement), pp. 67 73 Gan, F. F. (1994). Design of Optimal Exponential CUSUM Control Charts. Journal of Quality Technology, 26(2), pp. 109 124. Reynolds, M. R., Jr. and Stoumbos, Z. G. (2000). A General Approach to Modelling CUSUM Charts for a Proportion. IIE Transactions 32, pp.515 515 535. Reynolds, M. R., Jr. and Stoumbos, Z. G. (2004a). Should Observations Be Grouped for Eective Process Monitoring?, Journal of Quality Technology, 36(4), pp.343 366. Reynolds, M. R., Jr. and Stoumbos, Z. G. (2004b). Control Charts and the Ecient Allocation of Sampling Resources, Technometrics, 46(2), pp.200 214. Ryan, A. G and Woodall, W.H. (2010) Control charts for Poisson count data with varying sample sizes, Journal of Quality Technology 42, pp. 260 274. Schuh, A., Woodall, W. H., and Camelio, J. A. (2013). The Eect of Aggregating Data When Monitoring a Poisson Process, Journal of Quality Technology 45(3), pp. 260 271. Shen, X., Zou, C., Tsung, F., and Jiang, W. (2013). Monitoring Poisson count data with probability control limits when sample sizes are time-varying, Naval Research Logistics 60(8), pp. 625 636. Zhou, Q., Zou, C., Wang, Z., and Jiang, W. (2012), Likelihood-based EWMA charts for Monitoring Poisson count data with time varying sample sizes, J Am Stat Assoc 107, pp. 1049 1062.