Estimating and Simulating Nonhomogeneous Poisson Processes

Estimating and Simulating Nonhomogeneous Poisson Processes Larry Leemis Department of Mathematics The College of William & Mary Williamsburg, VA 2387 8795 USA 757 22 234 E-mail: leemis@math.wm.edu May 23, 23 Outline. Motivation 2. Probabilistic properties 3. Estimating Λ(t) from k realizations on (, S] 4. Estimating Λ(t) from overlapping realizations 5. Software 6. Conclusions Note: Portions of this work are with Brad Arkin (RST Corporation), Andy Glen (United States Military Academy), John Drew (William & Mary), and Diane Evans (Rose Hulman). Part of this work was supported by an NSF grant supporting the UMSA (Undergraduate Modeling and Simulation Analysis) REU (Research Experience for Undergraduates) at The College of William & Mary.

. Motivation Although easy to estimate and simulate, HPPs and renewal processes do not allow for varying rates. The use of an NHPP is often more appropriate for modeling. Example: Customer arrivals to a fast food restaurant 6 5 4 3 2 rate Breakfast Lunch Dinner 5 5 2 time Other examples: Cyclone arrival times in the Arctic Sea (Lee, Wilson, and Crawford, 99) Database transaction times (Lewis and Shedler, 976) Calls for on-line analysis of electrocardiograms at a hospital in Houston (Kao and Chang, 988) Respiratory cancer deaths near a steel complex in Scotland (Lawson, 988) Repairable systems: blood analyzers, fan motors, power supplies, turbines (Nelson, 988)

2. Probabilistic properties Notation t time N(t) number of events by time t λ(t) instantaneous arrival rate at time t (intensity function) Λ(t) = t λ(τ)dτ cumulative intensity function Property [ b a λ(τ)dτ ] n e b a λ(τ)dτ Pr (N(b) N(a) = n) = n! Property 2 E[N(t)] = Λ(t) n =,,... Property 3 (Çinlar, 975) If E, E 2,... are event times in a unit HPP then Λ (E ), Λ (E 2 ),... are event times in an NHPP with cumulative intensity function Λ(t). Λ( t ) E 5 E 4 E 3 E 2 E T T 2 T 3 T 4 T 5 t

3. Estimating Λ(t) from k realizations on (, S] Data t time (, S] time interval where observations are collected k number of realizations collected n, n 2,..., n k number of observations per realization t (), t (2),..., t (n) superposition of observations n = k i= n i total number of observations collected Intuitive solution (Law and Kelton, 2): partition time axis and let λ(t) be piecewise constant. Problems (a) Determining cell width Small cell width sampling variability Large cell width miss trend (b) Subjective due to arbitrary parameters 6 5 4 3 2 rate Breakfast Lunch Dinner 5 5 2 time

O O O O O O O O O Piecewise-linear nonparametric cumulative intensity function estimator in ˆΛ(t) = (n + )k + n ( ) t t (i) (n + ) k ( ) t (i) < t t (i+) t (i+) t (i) for i =,, 2,..., n. Rationale: ˆΛ(S) = n/k n k Λ( t ) 2n (n+)k n (n+)k t () t (2) S t

Properties Handles ties as expected Consistency lim ˆΛ(t) = Λ(t) k Confidence interval (asymptotically exact) for Λ(t) ˆΛ(t) ± z α/2 ˆΛ(t) Variate generation straightforward Input: Number of observed arrivals n Number of active realizations k Superpositioned observations t (), t (2),..., t (n+) Output: Event times T, T 2,..., T i on (, S] k i generate U i U(, ) E i log e ( U i ) while E i < n/k do begin m (n+)kei n [initialize variate counter] [generate initial random number] [generate initial exponential variate] [set m ˆΛ(t (m) ) < E i ˆΛ(t (m+) )] ( T i t (m) + [t (m+) t (m) ] (n+)kei n m [generate event time] i i + [increment variate counter] generate U i U(, ) [generate next random number] E i E i log e ( U i ) [generate next HPP event time] end )

4. Estimating Λ(t) from overlapping realizations Data t time (, S] time interval where observations are collected r # time intervals where the # realizations is fixed (s j, s j+ ] interval j +, j =,,..., r k j+ # realizations observed on (s j, s j+ ], j =,,..., r n j+ number of observations on (s j, s j+ ] t (), t (),..., t (n+r) superposition of observations, s, s,..., s r Note: s =, s r = S ˆΛ(t) = j q= n q k q + ( i j q= (n q + ) ) ( ) n j+ n j+ + t t(i) ( ) (n j+ + ) k j+ (n j+ + ) k j+ t(i+) t (i) t (i) < t t (i+) ; i =,, 2,..., n + r, Rationale: ˆΛ(s j+ ) = j+ s j < t s j+ ; j =,,..., r, q= n q /k q Single segment of ˆΛ(t) in the (j + )st region:, j + q = n q k q k j+ realizations n j+ observations ˆΛ(t (i+) ) ˆΛ(t (i) ) jq = n q k q s j t (i) t (i+) s j+

Example: Lunchwagon arrivals (Klein and Roberts, 984) Depiction of three regions for lunchwagon arrivals from : AM to 2:3 PM for k =, k 2 = 2, and k 3 = ; s =.5, s 2 = 3, and s 3 = 4.5. 6 5 4 Λ(t) 3 2.5.5 2 2.5 t 3 3.5 4 4.5

An asymptotically exact ( α)% confidence interval for Λ(t) Λ(t) ˆΛ(t) < z α/2 ˆΛ(t) ˆΛ(s j ) k j+ + j q= ˆΛ(s q ) ˆΛ(s q ) Parent cumulative intensity function, nonparametric estimator, and 95% confidence bands for lunchwagon arrivals from : AM to 2:3 PM for k =, k 2 = 2, and k 3 = ; s =.5, s 2 = 3, and s 3 = 4.5. k q 6 5 4 Λ(t) 3 2.5.5 2 2.5 t 3 3.5 4 4.5

Variate Generation Input: Number of partitions r Number of active realizations k, k 2,..., k r Number of observed arrivals per partition n, n 2,..., n r Superpositioned values t (), t (),..., t (n+r) Output: Event times T, T 2,..., T i on (, S] i j MAX r q= n q /k q generate U i U(, ) E i log e ( U i ) [initialize variate counter] [initialize region counter] [set MAX to ˆΛ(S)] [generate initial random number] [generate initial exponential variate] while E i < MAX do begin while E i > j+ q= n q /k q do [update j if necessary] begin j j + [increment region counter] end m (nj+ +)k j+(e i j q= n q/k q) n j+ ( T i t (m) +[t (m+) t (m) ] (n j++)k j+ + j q= (n q + ) [set m ˆΛ(t (m) ) < E i ˆΛ(t (m+) )] ) j E i q= nq/kq n j+ ( m j q= (n q + ) ) i i + generate U i U(, ) E i E i log e ( U i ) end [generate event time] [increment variate counter] [generate next random number] [generate next HPP event time]

Example : Monte Carlo evaluation of the confidence interval for Λ(t) Coverages in the lunchwagon example (nominal coverage.95;, replications; k =, k 2 = 2, k 3 = ; s =, s =.5, s 2 = 3, s 3 = 4.5). Time Actual Coverage Misses High Misses Low.9.95.3.487.35.9386.48.566.8.955.2.296 2.25.9466.96.339 2.7.9498.74.329 3.5.959.295.96 3.6.9498.25.25 4.5.957.67.36

Example 2: Failure times for 2 copy machines (Zaino and Berke 992) Failure times 45 Machine Number 4 35 3 25 234567 Actuations ˆΛ(t) for each machine ˆΛ(t) 8 7 6 5 4 3 2 234567 Actuations

ˆΛ(t) for the copy machine failure times 8 ˆΛ(t) 6 4 2 234567 Actuations

Example 3: Failure times of heat pump compressors (Nelson 99) The compressors are located in five separate buildings, each under repair contract for a time span (a, b], indicated by the boldface values. The data set consists of n = 28 failure times, and yields r = 29 regions. Bldg Num of Comp Entry time, Failure Times, Exit time B 64 2.59, 3.3, 4.62, 4.62, 5.75, 5.75, 7.42, 7.42, 8.77, 9.27, D 356 4.45, 4.47, 4.47, 5.56, 5.57, 5.8, 6.3, 7.2, 7. E 458., 2,85, 4.65, 4.79, 5.85, 6.73, 7.33 H 49.,.7,.7,.34, 5.9 K 95., 2.7, 3.65, 4.4, 4.4 ˆΛ(t) for the heat pump compressor failure times..8 ˆΛ(t).6.4.2 2 3 4 5 t (years) 6 7 8 9

5. Software Civilization advances by extending the number of important operations which we can perform without thinking about them. Alfred North Whitehead (86 947) APPL (A Probability Programming Language) is a Maplebased language with data structures for discrete and continuous random variables and algorithms for their manipulation. Example : Let X, X 2,..., X be independent and identically distributed U(,) random variables. Find Typical approaches Pr Central limit theorem Simulation 4 < i= n := ; X := UniformRV(, ); Y := Convolution(X, n); CDF(Y, 6) - CDF(Y, 4); 65577 972 X i < 6

Example 2: X T riangular(, 2, 4) Y T riangular(, 2, 3) Find the distribution of V = XY. 4 y 3 xy = 2 2 2 3 4 xy = 8 xy = 6 xy = 4 xy = 3 xy = 2 xy = x X := TriangularRV(, 2, 4); Y := TriangularRV(, 2, 3); V := Product(X, Y); which returns the probability density function of V as f V (v) = 4 3 v + 2 3 ln v + 2v 3 ln v + 4 3 < v 2 8 + 4 3 ln 2 + 7v 3 ln 2 + 3 v 4 ln v 5v 3 ln v 2 < v 3 4 + 4 3 ln 2 + 7v 3 ln 2 + 2v 2 ln v v ln v 2 ln 3 2v 3 ln 3 44 3 < v 4 3 4 ln 2 7v 3 ln 2 8 3 v 2 ln 3 + 22 3 ln v 2v 3 ln 3 + 4v 3 ln v 4 < v 6 8 3 8 ln 2 4v 3 ln 2 2 3 v + 4 3 ln v + v 3 ln v + 4 ln 3 + v 3 ln 3 6 < v 8 8 + 8 ln 2 + 2v 3 ln 2 + 2 3 v + 4 ln 3 4 ln v + v 3 ln 3 v 3 ln v 8 < v < 2

Example 3: Kolmogorov Smirnov test statistic (all parameters known) Defining formula: D n = sup x F (x) F n (x), Computational formula: D n = max i=,2,...,n i n x (i), i n x (i) The cdf for the test statistic is (Birnbaum, 952) P D n < 2n + v for v 2n 2n, where for u u 2 u n. = n! 2n +v 2n v 3 2n +v 3 2n +v v... 2n 2n v 2n 2n g(u, u 2,..., u n ) du n... du 2 du g(u, u 2,..., u n ) = CASE I: n = F D (t) = Pr(D t) = t 2 2t 2 < t < t CASE II: n = 2 F D2 (t) = Pr(D 2 t) = t 4 8 ( t ) 2 4 4 < t < 2 2( t) 2 2 < t < t

CASE I: n =. F(x).8.6.4.2. x..2.4.6.8. CASE II: n = 2. F(x).8.6.4.2. x..2.4.6.8.

Goal: X := KSRV(n); CASE III: n = 6. F(x).8.6.4.2. x..2.4.6.8. y < 2 468 y 6 234 y 5 + 48 y 4 6 3 y 3 + 3 y2 9 y + 5 324 2 y < 6 288 y 6 48 y 5 + 236 y 4 28 3 y 3 + 235 9 y2 + 27 y 5 8 6 y < 4 32 y 6 + 32 y 5 26 3 y 4 + 424 9 y 3 785 9 y2 + 45 27 y 35 296 4 y < 3 28 y F D6 (y) = 6 + 56 y 5 5 3 y 4 + 55 9 y3 + 525 54 y2 565 8 y + 5 6 3 y < 5 2 4 y 6 24 y 5 + 295 y 4 985 9 y 3 + 775 9 y2 7645 648 y + 5 5 6 2 y < 2 2 y 6 + 32 y 5 85 9 y3 + 75 36 y2 + 337 648 y y < 2 3 y 6 38 y 5 + 6 3 y4 265 9 y3 5 8 y2 + 465 648 y 2 3 y < 5 6 2 y 6 + 2 y 5 3 y 4 + 4 y 3 3 y 2 5 + 2 y 6 y < y.

Example 4:. Let X, X 2,..., X be iid geometric( / 4) random variables (parameterized from ). Find the mean and variance of X (2). Y := OrderStat(GeometricRV( / 4),, 2); Mean(Y); Variance(Y); yielding µ = 3583658956 2399275947 and σ 2 = 39699845747469522944 52329477258653395669 Example 5:. Fit a power law process to the odometer failure data over (, ]: 2,942 28,489 65,56 78,254 83,639 85,63 88,43 9,89 92,36 94,78 98,23 99,9. CarFailures := [2942, 28489, 6556, 78254, 83639, 8563, 8843, 989, 9236, 9478, 9823, 999]; X := WeibullRV(lambda, kappa); hat := MLENHPP(X, CarFailures, [lambda, kappa], ); PlotEmpVsFittedCIF(X, Sample, [lambda = hat[], kappa = hat[2]],, ); ˆλ =.2637 ˆκ = 2.568

2 8 CIF 6 4 2 2 4 6 8 x 6. Conclusions (a) Nonparametric estimation and simulation for NHPPs is straightforward. (b) Collecting data across overlapping intervals does not pose any significant problems. (c) Once coded, this approach requires less effort than a parametric renewal process in order to simulate the observations. (d) There may be potential for a probability package analogous to statistical packages such as SAS, SPSS, or S-Plus. (e) I am searching for situations where an exact probability calculation is needed (typically not the case in economics, OR, engineering, classical statistics; possibly the case in biology, chemistry, physics).

Bibliography Arkin, B., and Leemis, L., Nonparametric Estimation of the Cumulative Intensity Function for a Nonhomogeneous Poisson Process from Overlapping Intervals, Management Science, 46, 7, 2, 989 998. Drew, J.H., Glen, A.G., Leemis, L.M., Computing the Cumulative Distribution Function of the Kolmogorov-Smirnov Statistic, Computational Statistics and Data Analysis, 34,, 2, 5. Evans, D.L. and Leemis, L.M., Input Modeling Using a Computer Algebra System, in Proceedings of the 2 Winter Simulation Conference, J.A. Joines, R.R. Barton, K. Kang, P.A. Fishwick, eds., Institute of Electrical and Electronics Engineers, Orlando, Florida, 2, 577 586. Glen, A.G., Leemis, L.M., and Drew, J.H., A Generalized Univariate Change-of-Variable Transformation Technique, INFORMS Journal on Computing, 9, 3, 997, 288 295. Glen, A.G., Evans, D.L., and Leemis, L.M., APPL: A Probability Programming Language, The American Statistician, 55, 2, 2, 56 66. Glen, A.G., Leemis, L.M., and Drew, J.H., Computing the Distribution of the Product of Two Continuous Random Variables, forthcoming, Computational Statistics and Data Analysis. Leemis, L.M., Nonparametric Estimation of the Intensity Function for a Nonhomogeneous Poisson Process, Management Science, 37, 7, 99, 886 9.