International Journal of Statistics and Applications 26, 6(3): 68-76 DOI:.5923/j.statistics.2663. On a Generalization of Weibull Distribution and Its Applications M. Girish Babu Department of Statistics, Government Arts and Science College, Meenchanda, Kozhikode, India Abstract The Weibull and related models have been used in many applications for solving a variety of problems from many disciplines. Here we introduce a new family of distributions, namely Weibull-truncated negative binomial distribution (WTNB) and study some properties of it. Eponential-truncated negative binomial (ETNB) and Marshall-Olkin Weibull (MOW) distribution are special cases of this distribution. We have analyzed a real data set of serum creatinine values and found that this new distribution is a good fit to model the data, compared to Weibull, Gamma, Eponentiated Weibull, ETNB and MOW models. Keywords Eponential truncated negative binomial distribution, Marshall-Olkin family of distributions, Shaon entropy, Weibull distribution. Introduction The Weibull distribution is a very popular distribution for modeling lifetime data and for modeling phenomenon with monotone failure rates. The distribution is named after Waloddi Weibull who was the first to promote the usefulness of this to model the breaking strength of materials (Weibull, 939). A similar model was proposed earlier by Rosin and Rammler (933) in the contet of modeling the variability in the diameter of powder particles being greater than a specific size. The earlier known publication dealing with the Weibull distribution is a work by Fisher and Tippet (928) where this distribution is obtained as the limiting distribution of the smallest etremes in a sample. Gumbel (958) refers to the Weibull distribution as the third asymptotic distribution of the smallest etremes. The Weibull, and related models have been used in many applications, and for solving a variety of problems from many disciplines. Jayakumar and Girish (25) studied some generalizations of Weibull distribution and related time series models. Marshall and Olkin (997) proposed a new method of generating a family of distributions by introducing an additional parameter. Many authors have studied properties of various univariate distributions belonging to the family of Marshall-Olkin distributions, see, Alice and Jose (23, 25), Ghitany et al. (25, 27) and Jayakumar and Thomas (28). A generalization of the Marshall-Olkin distributions was * Corresponding author: giristat@gmail.com (M. Girish Babu) Published online at http://journal.sapub.org/statistics Copyright 26 Scientific & Academic Publishing. All Rights Reserved introduced by Nadarajah et al. (23) as follows: Let XX, XX 2, be a sequence of independent and identically distributed random variables with survival function FF(). Le NN be a truncated negative binomial random variable with parameters αα (,) and θθ >, that is, αα PP(NN = ) = θθ + αα θθ θθ θθ ( αα), =,2,. Consider UU NN = min(xx, XX 2,, XX NN ). Then PP(UU NN > ) = GG (; αα, θθ) = ααθθ + θθ ααθθ θθ ( αα)ff () = = ααθθ αα θθ FF() + ααff () θθ (.) Similarly, if αα > and NN is a truncated negative binomial random variable with parameters αα and θθ >, then VV NN = ma(xx, XX 2,, XX NN ) also has the survival function given in (.). Here note that in (.), if αα, then GG (; αα, θθ) FF(). If θθ =, then this family reduces to the family of Marshall Olkin distributions. Thus the family of distributions described in (.) is a generalization of the family of Marshall-Olkin distributions. This family can be interpreted as follows. Suppose the failure times of a device are observed. Every time a failure occurs the device is repaired to resume function. Suppose also that the device is deemed no longer useable when a failure occurs that eceeds a certain level of severity. Let XX, XX 2, denote the failure times and NN denote the number of failures. Then UU NN will represent the time to the first failure of the device and VV NN will represent the life time of the device. Thus this family can be used to model both the time to the first failure and the life time. The probability density function of (.) is
International Journal of Statistics and Applications 26, 6(3): 68-76 69 gg(; αα, θθ) = θθαα θθ ( αα)ff() αα θθ FF()+ααFF() The hazard rate function is given by h(; αα, θθ) = θθ( αα)ff()h FF () FF()+ ααff() FF()+ααFF() θθ θθ + (.2) (.3) where h FF () = ff() is the hazard rate corresponding to F. FF() Nadarajah et al. (23) introduced and studied a new family of distribution as eponential-truncated negative binomial distribution with parameters αα, θθ and λλ by substituting FF() = ee λλλλ, λλ >, > in the survival function (.). That is, ααθθ GG (; αα, θθ, λλ) = αα θθ ee λλλλ + ααee λλλλ θθ (.4) for αα >, θθ >, λλ > and >. The paper is organized as follows. In section 2, we propose a new family of distributions, namely Weibull-truncated negative binomial distribution (WTNB). We study some properties of WTNB, such as the behavior of hazard rate, moments, Shaon and Renyi entropies and distributions of order statistics. Also the maimum likelihood method of estimation is used to obtain the estimates of parameters of WTNB. In section 3, we analyze a real data set of serum creatinine (mg.dl) values and found that WTNB is the most appropriate model for the data. The performance of the WTNB is compared to the well known models such as Weibull, Gamma, eponentiated Weibull, ETNB and Marshall-Olkin Weibull using AIC, BIC, K-S statistic and P-values. 2. Weibull-Truncated Negative Binomial Distribution We propose a new family of distribution named as Weibull-truncated negative binomial (WTNB) distribution with parameters αα >, θθ >, cc > and > by substituting FF() = ee cc, cc >, > in the survival function (.). Then, αα θθ GG (; αα, θθ, cc) = αα θθ ee cc + ααee cc θθ (2.) The probability density function of this distribution is gg(; αα, θθ, cc) = ( αα)θθθθ θθ cc cc ee cc αα θθ ee cc +ααee cc The hazard rate function is given by h(; αα, θθ, cc) = ( αα)θθθθ cc ee cc θθ + (2.2) ee cc +ααee cc [ ee cc +ααee cc θθ ] (2.3) The shape of density and hazard rate functions are shown in the following figures. The cumulative probabilities at different choices of parameters are computed by using MATHCAD software and the results are shown in Table 2.. Some distributions arise as special cases of the WTNB(αα, θθ, cc) Case I : For cc =, ααθθ GG (; αα, θθ) = ( αα θθ ee + ααee ) θθ (2.4) This is the survival function of two parameter Eponential-truncated negative binomial distribution..5.2 g(,.5,.5, ) g(,.5,.5,.5) g(,.5,,.5) g(,.5,.5, 3).9.6.3 2 3 4 5 Figure 2.. Probability density function of WTNB distribution
7 M. Girish Babu: On a Generalization of Weibull Distribution and Its Applications 2.6 h(,.5,, ) h(,.5,.5,.5) h(,.5, 2,.5) h(,.5, 5,.5).2.8.4 2 3 4 5 Figure 2.2. Hazard rate function of WTNB distribution X Table 2.. Cumulative Probabilities of WTNB distribution θθ =, αα =.5, cc =.5 θθ =.5, αα =.5, cc =.5 θθ =.5, αα =.5, cc =.5 θθ = 2, αα =.5, cc =.5.775.996.85.833.862.997.883.9 2.93.997.98.932 3.927.997.939.95 4.944.997.953.96 5.955.998.963.969 6.963.998.97.975 7.97.998.975.979 8.974.998.979.983 9.978.998.982.985.982.999.985.988.984.999.987.989 2.986.999.989.99 3.988.999.99.992 4.989.999.99.993 5.99.999.992.994 6.992.999.993.995 7.993.999.994.995 8.994.999.995.996 9.994.999.995.996 2.995.999.996.997 2.995.999.996.997 22.996.999.997.997 23.996.999.997.998 24.997.999.997.998 25.997.997.998 26.997.998.998 27.997.998.998 28.998.998.998 29.998.998.999 3.998.998.999 3.998.999.999 32.998.999.999 33.999.999.999 Case II: For θθ =, GG (, αα, cc) = αα ee cc (2.5) ( αα)ee cc This is the survival function of Marshall-Olkin etended Weibull distribution with parameters αα and cc. Case III: For θθ =, αα = GG (, cc) = ee cc (2.6) This is the survival function of one parameter Weibull distribution. Case IV: For θθ =, αα = 2 GG (, cc) = 2 (2.7) +ee cc This is the survival function of the generalized half logistic distribution. Case V: When αα, WTNB(αα, θθ, cc) reduces to the one parameter Weibull distribution. A random sample from WTNB(αα, θθ, cc) distribution can be simulated as XX = ln ααααθθ +YY αα θθ θθ αα cc for YY~UU(,). (2.8)
International Journal of Statistics and Applications 26, 6(3): 68-76 7 2.. Moments Suppose that X has the WTNB(αα, θθ, cc) distribution. The n th moment can be written as Taking uu = ee cc, (2.9) reduces to EE(XX ) = ( αα)θθααθθ cc ( αα θθ ) EE(XX ) = ( αα)θθααθθ ( αα θθ ) If αα <, then by the series epansion ( ) mm = mm + kk kk= kk, equation (2.) can be written as mm where uu kk ( log uu) cc Therefore, If αα < αα, then we have EE(XX ) = ( αα)θθααθθ ( αα θθ ) = cc cc cc ( ) (kk+) + +cc ee cc [ ( αα)ee cc ] θθ + ( log (uu)) cc [ ( αα)uu] θθ + + kk θθ ( αα)kk uu kk ( log uu) cc θθ kk= EE(XX ) = ( αα)θθααθθ cc cc cc ( ) ( αα θθ ) EE(XX ) = ( αα)θθ ( αα θθ )αα ( log ( uu)) cc [+ αα uu]θθ + αα (2.9) (2.) θθ + kk ( αα)kk kk= θθ (kk+) +. (2.) Taking uu = vv in equation (2.2) and using equation (2.6.5.3) of Prudnikov et al. (986), we have 2.2. Shaon Entropy EE(XX ) = = = ( αα)θθ ( αα θθ )αα ( log vv) [ + αα αα ( vv)]θθ+ ( αα)θθ ( αα θθ )αα ( log vv) cc θθ αα θθ kk= cc + kk θθ θθ ( )kk αα αα ( ) kk θθ + kk αα θθ αα kk= = θθ cc cc cc ( ) kk kk+ ( log vv) cc ( vv) kk ( vv) kk (2.2) θθ + kk αα θθ θθ ( )kk αα ( ) jj (kk+ jj) αα kk+ kk jj kk= jj= (2.3) jj!(jj+) cc + Entropy is a measure of variation or uncertainty. The Renyi entropy of a random variable with probability density function gg(. ) is defined as The Renyi entropy of WTNB(αα, θθ, cc) is Letting uu = ee cc, we have II RR (γγ) = II RR (γγ) = The Shaon entropy is given by II RR (γγ) = log γγ ggγγ (), γγ >, γγ. (2.4) γγ log ( αα)θθαα θθ cc cc ee cc ( αα θθ )( ( αα)ee cc ) θθ+ γγ log cc γγ ( αα)θθααθθ αα θθ γγ ( log uu) γγ cc uu γγ ( αα)ee cc γγ (θθ +) (2.5)
72 M. Girish Babu: On a Generalization of Weibull Distribution and Its Applications 2.3. Order Statistics ααθθ EE[ log gg(xx)] = log (cc )EE(log XX) + ( αα)θθαα θθ cc EE(XXcc ) + (θθ + ) EElog ( αα)ee XXcc (2.6) Let XX, XX 2,, XX are independent random variables having the WTNB (αα, θθ, cc) distribution. Let XX ii: denote the i th order statistic. The probability density function of XX ii: is! gg ii: (; αα, θθ, cc) = (ii )! ( ii)! gg(; αα, θθ, cc) GGii (; αα, θθ, cc)gg ii (; αα, θθ, cc) = ( ) ii!( αα)θθαα θθ ( + ii) cc. cc ee cc (ii )!( ii)! αα θθ ( αα)ee cc θθ + ii αα θθ ( αα)ee cc θθ Using the binomial series epansion, the probability density function can be written as gg ii: (; αα, θθ, cc) = ii ii ( ) ii! αα θθ( ii) (ii )! ( ii)! ( αα θθ ii ) kk kk= ll= ii ( αα)ee cc θθ. (2.7) ii ( )kk+ll αα θθ(kk+ll+) ll αα θθθθ (kk + ll + ) gg(; αα, θθ(kk + ll + ), cc). (2.8) This shows that XX ii: is a finite miture of WTNB random variables. 2.4. Estimation For a given sample(, 2,, ), the log-likelihood function is given by log LL(αα, θθ, cc) = log ( αα)θθααθθ cc cc + (cc ) log αα θθ ii= ii ii= ii (θθ + ) log ( αα)ee ii cc ii= (2.9) The partial derivatives of the log-likelihood function with respect to the parameters are log LL log LL = +( θθ)ααθθ +θθαα θθ ( αα) αα θθ log LL = log αα cc + (θθ + ) ee ii αα ii= (2.2) ( αα)ee ii cc + log ( αα θθ θθ αα)ee ii ii= (2.2) cc ii cc log ii ee ii cc = + log cc ii= ii ii= cc ii log ii (θθ + )( αα) ii= (2.22) ( αα)ee ii cc Equating these partial derivatives equal to zero and solving these equations numerically, we get the maimum likelihood estimates of αα, θθ and cc. 3. Application In this section, we analyze a real data set and found that WTNB distribution gives the best fit for the data. We consider a real data set of serum creatinine (mg/dl) value of 3 samples reported on February 2 nd, 26 at the Biochemistry Laboratory of Govt. Medical College, Calicut, Kerala. The data are follows:.83.5.37..26.39.3.66.65.74.7.64.6.38.88.82.68.5.68.33.5.53.5.77.86.3.2.22.69.7.2.7.79.34.4.6.69.63.76.49.55.42.62.42.5.72.43.46.88.35.48.43.57.58.39.99.85..85.73.66.8.43.47.24.46.27.32.42.3.5.59.72.29.58.35.8.93.3.9.67.98.73.35.89.2.35.94.33.59.43.29.4.77.68.38.73.48.33.3.68.47.6.84..7.5.58.7.28.23.85.96.2.36.73.53.75.7.62.7.69.25.9.44.86.67.8.3.6.56.2.53.27.64.64.47.29.57.94.36.6.2.7.75.3.34.43.9.79.58.56.83.99.5.4.57.22.52.25.57.75.3.58.87.65.5.57.6.27.8.64.33.94.44.83.2.82.27.83.7.78.4.66.8.2.33.54.55.67.6.93.8.49.68.49.79.26.58.43.88.5.58.7.4.9.73.89.57.4.65.3.72.92.72.53.92.54.8.2.37.88.33.56.5.7.39.87.7.68.77.66.47.5.4.38.73.26.82.84.85.4.5.35.69.7.4.79.94.53.34.9.25.26.79.96.43.46.52.84.9.64.96.97.72.62.2..77.79.63.49.4.42.79.78.75.97.72.2.45.56.3.5.56.4.94.37.39.7.2.99.76.43.33.32.78.44.5.77 Creatine is a chemical made by the body and is used to supply energy mainly to muscle. Serum creatinine is a chemical waste product of creatine. Creatinine is removed from the body entirely by the kidneys. If kidney function is not normal, the creatinine level will increase. The normal range of creatinine is about.7 to.3 mg/dl. The descriptive measures of the 3 samples are as follows:
International Journal of Statistics and Applications 26, 6(3): 68-76 73 Initially a histogram of the data is plotted and a normal curve is embedded in it and we observe that normal distribution is not suitable to model the data since the data is highly positively skewed. Table 3.. Descriptive statistics of Creatinine (mg/dl) Mean Median SD Skewness Kurtosis Min Ma.798.755.337.476 -.22.6.87 Figure 3.. Empirical structure of the serum creatinine data The P-P plot and Q-Q plot of the creatinine data is as follows
74 M. Girish Babu: On a Generalization of Weibull Distribution and Its Applications Figure 3.2. P-P plot and Q-Q plot of the serum creatinine data Since the data shows a positively skewed nature, the symmetrical distributions will not be a suitable choice. So we fitted the data using WTNB with three parameters and compared with the well known distributions like Weibull, Gamma, Eponentiated Weibull, Marshall-Olkin Weibull and ETNB. For comparing the goodness of fit of the model we used the information criteria, Akaike Information Criterion (AIC = -2 log L+2k), Bayesian Information Criterion (BIC = -2log L+k log n) and the Kolmogorov-Smirnov statistic, where k is the number of unknown parameters, log L is the log-likelihood function value and n is the sample size. The results are presented in Table 3.2. Table 3.2. Parameter estimates and goodness of fit statistics for various models fitted to the serum creatinine data Weibull Gamma Distribution Parameters AIC BIC K-S P value ff(; λλ, cc) = ccλλ cc cc ee (λλλλ )cc, λλ, cc >. ff(; ββ, λλ) = Eponentiated Weibull λλββ Γ(ββ) ββ ee λλλλ, ββ, λλ >. ff(; λλ, θθ, cc) = ccλλ cc θθ cc ee (λλλλ )cc ee (λλλλ )cc θθ, λλ, θθ, cc >. ETNB ff(; αα, θθ, λλ) = ( αα)θθαα θθ λλee λλλλ ( αα θθ )( ( αα)ee λλλλ ) θθ+, αα, θθ, λλ > Marshall-Olkin Weibull ff(; αα, λλ, cc) = ααααλλcc cc (λλλλ )cc ee [ ( αα)ee (λλλλ )cc ] 2, αα, cc, λλ > WTNB ff(; αα, θθ, cc) = ( αα)θθαα θθ cc cc cc ee ( αα θθ )( ( αα)ee cc ) θθ+, αα, θθ, cc > λλ =., cc =2.55. ββ =5.22, λλ =6.54. λλ =.36, θθ =.66, cc =.95. αα =2.53, θθ =7.7, λλ =3.73. αα=.39, λλ =.9, cc =3.6. αα =.5, θθ =.52, cc =2.84. 83.8 84.7.457.558 84.3 85.3.432.6287 83. 84.6.393.7428 9.4 92.8.482.4877 83.6 85..358.8364 82.9 84.3.338.8832 From the above Table 3.2., we can see that the WTNB distribution is the suitable model for the given data. The probability plots for the fitted models are presented in figure 3.3.
International Journal of Statistics and Applications 26, 6(3): 68-76 75..2.4.6.8..2 Weibull distribution..2.4.6.8..2 Weibuli-truncated negativ..5..5 2. Gamma distribution..5..5 2...2.4.6.8..2.4..2.4.6.8..2 Marshall-Olkin Weibull dis..5..5 2...5..5 2. Eponential-truncated nega Eponentiated Weibull dis..2.4.6.8..2.4..2.4.6.8..2..5..5 2...5..5 2. Figure 3.3. Fitted probability density function for the serum creatinine (mg/dl) data
76 M. Girish Babu: On a Generalization of Weibull Distribution and Its Applications We have used the nlm function of R software to compare the empirical distribution and the theoretical distribution of all the si distributions mentioned above. From the results out of the si fitted models, WTNB is closer to the empirical distribution of the serum creatinine data. The following figure shows the closeness of the distribution functions. REFERENCES [] Alice, T. and K.K. Jose (23): Marshall-Olkin Pareto process, Far East Journal of Theoretical Statistics, 9, 7-32. [2] Alice, T. and Jose, K.K. (25): Marshall-Olkin semi-weibull minification processes, Recent Advances in Statistical Theory and Applications,, 6-7. [3] Fisher, R.A. and Tippett, L.H.C. (928): Limiting forms of the frequency distribution of the largest and smallest member of a sample, Proceedings of the Cambridge Philosophical Society, 24, 8-9. [4] Ghitany, M.E., Al-Hussaini, E.K. and Al-Jaralla, R.A. (25): Marshall-Olkin etended Weibull distribution and its application to censored data, Journal of Applied Statistics, 32, 25-34. [5] Ghitany, M.E., Al-Awadgi, F.A. and Alkhalfan, I.A. (27): Marshall-Olkin etended Loma distribution and its application to censored data, Communication in Statistical Theory and Methods, 36, 855-866. [6] Gumbel, E.J. (958): Statistics of Etremes, Columbia University Press, New York. [7] Jayakumar, K. and Girish, B.M. (25): Some generalizations of Weibull distribution and related processes, Journal of Statistical Theory and Applications, 4, 425-434. Figure 3.4. WTNB distribution function and empirical distribution 4. Conclusions In the present paper, we have studied a new family of distributions, namely Weibull truncated negative binomial distributions. Some properties of this distribution such as hazard rate, moments, Shaon and Renyi entropies, distributions of order statistics are studied. This distribution is found to be the most appropriate model to fit the serum creatinine data compared to the Weibull, Gamma, Eponentiated Weibull, ETNB and MOW models. We hope that the new family will attract wider application in life time modeling. [8] Jayakumar, K. and Thomas, M. (28): On a generalization to Marshall-Olkin scheme and its application to Burr type XII distribution, Statistical Papers, 49, 42-439. [9] Marshall, A.W. and Olkin, I. (997): A new method for adding parameter to a family of distributions with application to eponential and Weibull families, Biometrika, 84, 64-652. [] Nadarajah, S., Jayakumar, K. and Ristic, M.M. (23): A new family of lifetime models, Journal of Statistical Computation and Simulation, 83, 389-44. [] Prudnikov, A.P., Brychkov, Y.A., and Marichev, O.I. (986): Integrals and Series Vol., Gordon and Breach Sciences, Amsterdam, Netherlands. [2] Rosin, P. and Rammler, E. (933): The laws governing the fineness of powdered coal, Journal of the Institute of Fuels, 6, 29-36. [3] Weibull, W. (939): A statistical theory of the strength of material, Ingeniors Vetenskaps Akademiens Handligar Report No.5, Stockholm.