Fitting mixtures of Erlangs to censored and truncated data using the EM algorithm

Size: px
Start display at page:

Download "Fitting mixtures of Erlangs to censored and truncated data using the EM algorithm"

Transcription

1 Fitting mixtures of Erlangs to censored and truncated data using the EM algorithm Katrien Antonio 1,2, Andrei Badescu 3, Lan Gong 3, Sheldon Lin 3, and Roel Verbelen 2 1 Faculty of Economics and Business, University of Amsterdam, The Netherlands. 2 LStat, Faculty of Economics and Business, KU Leuven, Belgium. 3 Department of Statistics, University of Toronto, Canada. January 9, 2014 Abstract We present a calibration procedure for fitting mixtures of Erlangs to censored and truncated data by iteratively using the EM algorithm. Mixtures of Erlangs form a very versatile, yet analytically tractable, class of distributions making them suitable for modeling purposes. The effectiveness of the proposed algorithm is demonstrated on simulated data as well as real data sets. Keywords: Mixture of Erlang distributions with a common scale parameter; Censoring; Truncation; Expectation-maximization algorithm; Maximum likelihood. 1 Introduction The class of mixtures of Erlangs with a common scale parameter is very flexible in terms of the possible shapes of its members. Tms 1994) shows that mixtures of Erlangs are dense in the space of positive distributions in the sense that there always exists a series of mixtures of Erlangs that weakly converges, i.e. converges in distribution, to any positive distribution. As such, any continuous distribution can be approximated by a mixture of Erlang distributions to any accuracy. Furthermore, via direct manipulation of the Laplace transform, a wide variety of distributions whose membership in this class is not immediately obvious can be written as a mixture of Erlangs. The class of mixtures of Erlang distributions with a common scale is also closed under mixture, convolution and compounding. At the same time, it is possible to work analytically with this class leading to explicit expressions for e.g. the Laplace transform, the hazard rate, a quantile or Value-at-Risk VaR) and stop-loss moments. Klugman et al. 2013), Willmot and Lin 2011) and Willmot and Woo 2007) give an overview of these analytical and computational properties of mixtures of Erlangs. Mixture models are often used to reflect the heterogeneity in a population consisting of multiple groups or clusters McLachlan and Peel, 2001). In some applications, these clusters can be Corresponding author. adress: roel.verbelen@kuleuven.be 1

2 1 Introduction 2 physically identified and used to interpret the fitted distributions. This is however not the approach we follow; the components in the mixture will not be identified with existing groups. Mixtures of Erlangs are discussed here for their great flexibility in modeling data and should be regarded as a semiparametric density estimation technique. The densities in the mixture are parametrically specified as Erlangs, whereas the associated weights form the nonparametric part. The number of Erlangs in the mixture with non-zero weights can be viewed as a smoothing parameter. Mixtures of Erlangs have much of the flexibility of nonparametric approaches, while retaining some of the advantages of parametric approaches such as a reasonable amount of parameters) and furthermore allowing for tractable results. Lee and Lin 2010) iteratively use the EM algorithm Dempster et al., 1977) for finite mixtures to estimate the parameters of a mixture of Erlang distributions with a common scale parameter. They demonstrate the usefulness of mixtures of Erlangs in the context of quantitative risk management for the insurance business. This industry requires on the one hand the flexibility of nonparametric density estimation techniques to describe the insurance losses and on the other hand the feasibility to analytically quantify the risk, which is exactly what the class of mixtures of Erlangs offers. However, modeling censored and/or truncated losses is not covered by the approach in Lee and Lin 2010). In many practical problems data are censored and/or truncated. In biostatistics a typical example is a study which follows patients over a period of time. In case the event of interest has not yet occurred before the end of the study, the patient drops out or dies from another cause, independent of the cause of interest, the event time is right censored. In actuarial science insurance losses as studied by Lee and Lin, 2010)) are often censored and truncated due to policy modifications such as deductibles left truncation) and policy limits right censoring). The modeling of data on operational risk OR), as required by the Basel Committee on Banking Supervision, is another example. Banks use data from their internal OR data base, and refer to the Operational Riskdata exchange Association ORX) for external loss data. Usually, the internal OR database of banks in North America is subect to a minimum recording threshold of $10, $15, 000, while in the external database operational losses start from $30, 000. In both cases, when fitting a parametric distribution to the loss data, one should take the left truncation of the data into consideration. Motivated by the large number of areas where censored and truncated data are encountered, we develop an extension of the iterative EM algorithm of Lee and Lin 2010) for fitting mixtures of Erlangs with common scale parameter to censored and truncated data. The traditional way of dealing with grouped and) truncated data using the EM algorithm involves treating the unknown number of truncated observations as a random variable and including it into the complete data vector Dempster et al., 1977; McLachlan and Krishnan, 2007, p. 66; McLachlan and Peel, 2001, p. 257; McLachlan and Jones, 1988). We will not follow this approach and rather only include the uncensored observations and the component-label vectors in the complete data vector as is also done in Lee and Scott 2012). We demonstrate the fitting procedure on a wide range of applications, from econometrics, actuarial science and biostatistics. Our R implementation is available online at In the following, we briefly introduce mixtures of Erlangs with a common scale parameter in Section 2. The adusted EM algorithm, able to deal with censored and truncated data, is presented in Section 3. The procedures used to initialize the parameters, to adust the shapes of the Erlangs in the mixture and to choose the number of components are discussed in Section 4. Examples follow in Section 5 and Section 6 concludes.

3 3 The EM algorithm for censored and truncated data 3 2 Mixtures of Erlangs with a common scale parameter The Erlang distribution is a positive continuous distribution with density function fx; r, θ) xr 1 e x/θ θ r r 1)! for x > 0, 1) where r, a positive integer, is the shape parameter and θ > 0 the scale parameter the inverse λ 1/θ is called the rate parameter). The cumulative distribution function is obtained by integrating 1) by parts r times F x; r, θ) x Using the lower incomplete gamma function defined as 0 z r 1 e z/θ r 1 θ r r 1)! dz 1 x/θ x/θ)n e. 2) n! γs, x) the cumulative distribution function becomes F x; r, θ) x 0 x z r 1 e z/θ θ r r 1)! dz 1 r 1)! 0 n0 z s 1 e z dz, x/θ 0 u r 1 e u du γr, x/θ) r 1)!. 3) Following Lee and Lin 2010) we consider mixtures of M Erlang distributions with common scale parameter θ > 0 and having density fx; α, r, θ) x r 1 e x/θ M α θ r r 1)! α fx; r, θ) for x > 0, 4) where the positive integers r r 1,..., r M ) with r 1 <... < r M are the shape parameters of the Erlang distributions and α α 1,..., α M ) with α > 0 and M α 1 are the weights used in the mixture. Similarly, the cumulative distribution function can be written as a weighted sum of terms 2) or 3). Tms 1994, p. 163) shows that the class of mixtures of Erlang distributions with a common scale parameter is dense in the space of distributions on R +. Lee and Lin 2010) given an alternative proof using characteristic functions. 3 The EM algorithm for censored and truncated data Lee and Lin 2010) formulate the EM algorithm customized for fitting mixtures of Erlangs with a common scale parameter to complete data. In an addendum available online 1, we work out the details with zero-one component indicators inspired by McLachlan and Peel 2001) and Lee and Scott 2012). Here, we construct an adusted EM algorithm which is able to deal with censored and truncated data. We represent a censored sample truncated to the range [t l, t u ] by X {l i, u i ) i 1,..., n}, 1

4 3 The EM algorithm for censored and truncated data 4 where t l and t u represent the lower and upper truncation points, l i and u i the lower and upper censoring points and t l l i u i t u for i 1,..., n. t l 0 and t u mean no truncation from below and above, respectively. The censoring status is determined as follows: Uncensored: Left Censored: Right Censored: Interval Censored: t l l i u i : x i t u t l l i < u i < t u t l < l i < u i t u t l < l i < u i < t u For example, when the truncation interval equals [t l, t u ] [0, 10], an uncensored observation at 1 gets denoted by l i, u i ) 1, 1), an observation left censored at 2 by l i, u i ) 0, 2), an observation right censored at 3 by l i, u i ) 3, 10) and an observation censored between 4 and 5 by l i, u i ) 4, 5). Thus, l i and u i should be seen as the lower and upper endpoints of the interval of which contains observation i. The parameter vector to be estimated is Θ α, θ). The number of Erlangs M in the mixture and the corresponding positive integer shapes r are fixed. The value of M is, in most applications, however unknown and has to be inferred from the available data, along with the shape parameters, see Section 4. The portion of the likelihood containing the unknown parameter vector Θ is given by LΘ; X ) i U fx i ; Θ) F u i ; Θ) F l i ; Θ) F t u ; Θ) F t l ; Θ) F t u ; Θ) F t l ; Θ) i C where U is the subset of observations in {1,..., n} which are uncensored and C is the subset of left, right and interval censored observations. In case there is no truncation, i.e. [t l, t u ] [0, ], the contribution of a left censored observation to the likelihood equals F u i ; Θ) since l i 0, of a right censored observation 1 F l i ; Θ) with u i, and of an interval censored observation F u i ; Θ) F l i ; Θ). The corresponding log likelihood is lθ; X ) ln α fx i ; r, θ) + ln α F u i ; r, θ) F l i ; r, θ)) i U i C n ln α F θ)) t u ; r, θ) F t l ; r,, 5) i1 which is difficult to optimize numerically. 3.1 Construction of the complete data vector The EM algorithm provides a computationally much easier way to fit this finite mixture to the censored and truncated data. The main clue is to regard the censored sample X as being incomplete since the uncensored observations x x 1,..., x n ) and their associated componentindicator vectors z z 1,..., z n ) with { 1 if observation x i comes from th component density fx; r, θ) z 6) 0 otherwise

5 3 The EM algorithm for censored and truncated data 5 for i 1,..., n and 1,..., M, are not available. The component-label vectors z 1,..., z n are taken to be realized values of the random vectors Z 1,..., Z n i.i.d. Mult M 1, α). The complete data vector, Y x 1,..., x n, z) {x i, z i ) i 1... n}, contains all uncensored observations x i and their corresponding mixing component vector z i. The probability density function of an uncensored observation x i after truncation t l, t u ) is given by fx i ; t l, t u fx i ; Θ), Θ) F t u ; Θ) F t l ; Θ) fx i ; r, θ) α F t u ; Θ) F t l ; Θ) α F tu ; r, θ) F t l ; r, θ) F t u ; Θ) F t l ; Θ) β fx i ; t l, t u, r, θ), fx i ; r, θ) F t u ; r, θ) F t l ; r, θ) for t l x i t u and zero otherwise. This is again a mixture with mixing weights β and component density functions given by, respectively, β α F tu ; r, θ) F t l ; r, θ) F t u ; Θ) F t l ; Θ) 7) and fx i ; t l, t u, r, θ) fx i ; r, θ) F t u ; r, θ) F t l ; r, θ). 8) The component density functions fx i ; t l, t u, r, θ) are truncated versions of the original component density functions fx i ; r, θ). The weights β are re-weighted versions of the original weights α by means of the probabilities of the corresponding component to lie in the truncation interval. The log likelihood of the complete sample Y then becomes lθ; Y) n i1 ) z ln β fx i ; t l, t u, r, θ), 9) which has a simpler form than the incomplete log likelihood 5) as it does not contain logarithms of sums. The EM algorithm deals with the censored and truncated data from the mixture of Erlangs with common scale in the following steps. 3.2 Initial step Initialization of the parameters θ and α is done in the same way as in the uncensored case Lee and Lin, 2010) and is based on the denseness property Tms, 1994, p. 163): θ 0) maxd) nd and α 0) i1 I r 1 θ 0) < d i r θ 0)), for 1,..., M, 10) r M n d

6 3 The EM algorithm for censored and truncated data 6 with r 0 0 for notational convenience. The data d used in 10), of which the size is denoted by n d, can be restricted to the uncensored data points only. Alternatively, the censored data can also be used by treating the left and right censored data points as being observed, i.e. use u i and l i, respectively, and by using the mean for the interval censored data points, i.e. use l i + u i )/2. In this manner, censored observations at the low-end or the high-end of the sample range which can be of importance at initialization will not be ignored. The truncation is only taken into account to transform the initial values for α into the initial values for β via 7). 3.3 E-step In the kth iteration of the E-step, we take the conditional expectation of the complete log likelihood 9) given the incomplete data X and using the current estimate Θ k 1) for Θ. QΘ; Θ k 1) ) ElΘ; Y) X ; Θ k 1) ) E θ)) Z ln β fx i ; t l, t u, r, i U X ; Θk 1) + E θ)) Z ln β fx i ; t l, t u, r, i C X ; Θk 1) Q u Θ; Θ k 1) ) + Q c Θ; Θ k 1) ), 11) where Q u Θ; Θ k 1) ) and Q c Θ; Θ k 1) ) are the conditional expectations of the uncensored and censored part of the complete log likelihood, respectively. Uncensored case. The truncation does not complicate the computation of the expectation for the uncensored data as Q u Θ; Θ k 1) ) E θ)) Z ln β fx i ; t l, t u, r, i U X ; Θk 1) [ E Z X ; Θ k 1)] ) ln β fx i ; t l, t u, r, θ) i U ) u z k) ln β fx i ; t l, t u, r, θ) i U [ u z k) lnβ ) + r 1) lnx i ) x i θ r lnθ) i U )] lnr 1)!) ln F t u ; r, θ) F t l ; r, θ), 12)

7 3 The EM algorithm for censored and truncated data 7 with, for i U and 1,..., M, u z k) P Z 1 x i, t l, t u ; Θ k 1) ) 8) 7) β k 1) fx i ; t l, t u, r, θ k 1) ) M m1 βk 1) m fx i ; t l, t u, r m, θ k 1) ) β k 1) fx i ; r, θ k 1) )/ F t u ; r, θ k 1) ) F t l ; r, θ k 1) ) ) M m1 βk 1) m fx i ; r m, θ k 1) )/ F t u ; r m, θ k 1) ) F t l ; r m, θ k 1) ) ) α k 1) fx i ; r, θ k 1) ) M m1 αk 1) m fx i ; r m, θ k 1) ). 13) The E-step for the uncensored part only requires the computation of the posterior probabilities u z k) that observation i belongs to the th component in the mixture, which remains the same in the truncated case and in the untruncated case. Censored case. Denote by c z k) im the posterior probability that observation i belongs to the mth component in the mixture for a censored data point. Then Q c Θ; Θ k 1) ) E θ)) Z ln β fx i ; t l, t u, r, i C X ; Θk 1) θ)) E Z ln β fx i ; t l, t u, r, i C l i, u i, t l, t u ; Θ k 1) i C i C m1 m1 ) c z k) im [ln E β m fx i ; t l, t u, r m, θ) Zim 1, l i, u i, t l, t u ; θ k 1)] [ c z k) ) im lnβ m ) + r m 1)E lnx i ) Z im 1, l i, u i, t l, t u ; θ k 1) 1 X ) θ E Zim i 1, l i, u i, t l, t u ; θ k 1) r m lnθ) lnr m 1)!) )] ln F t u ; r m, θ) F t l ; r m, θ) 14) Again using Bayes rule, we can compute these posterior probabilities, for i C and m 1,..., M, as c z k) im P Z im 1 l i, u i, t l, t u ; Θ k 1) ) F ui ; t l, t u, r m, θ k 1) ) F l i ; t l, t u, r m, θ k 1) ) ) βk 1) m M βk 1) βk 1) m M βk 1) 7) αk 1) m M αk 1) F ui ; t l, t u, r, θ k 1) ) F l i ; t l, t u, r, θ k 1) ) ) F ui ; r m, θ k 1) ) F l i ; r m, θ k 1) ) ) / F t u ; r m, θ k 1) ) F t l ; r m, θ k 1) ) ) F ui ; r, θ k 1) ) F l i ; r, θ k 1) ) ) / F t u ; r, θ k 1) ) F t l ; r, θ k 1) ) ) F ui ; r m, θ k 1) ) F l i ; r m, θ k 1) ) ) F ui ; r, θ k 1) ) F l i ; r, θ k 1) ) ). 15)

8 3 The EM algorithm for censored and truncated data 8 The terms in 14) for Q c Θ; Θ k 1) ) containing ElnX i ) Z im 1, l i, u i, t l, t u ; θ k 1) ) will not play a role in the EM algorithm as they do not depend on the unknown parameter vector Θ. The E-step does require the computation of the expected value of X i conditional on the censoring times and the mixing component Z i for the current value Θ k 1) of Θ: E X ) Zim i 1, l i, u i, t l, t u ; θ k 1) ui l i fx; r m, θ k 1) ) x F u i ; r m, θ k 1) ) F l i ; r m, θ k 1) ) dx r m θ k 1) F u i ; r m, θ k 1) ) F l i ; r m, θ k 1) ) ui l i x rm e x/θk 1) θ k 1) ) r m+1 rm! dx r mθ k 1) F u i ; r m + 1, θ k 1) ) F l i ; r m + 1, θ k 1) ) ) F u i ; r m, θ k 1) ) F l i ; r m, θ k 1) ) for i C and m 1,..., M, which has a closed-form expression., 3.4 M-step In the M-step, we maximize the expected value 11) of the complete data log likelihood obtained in the E-step with respect to the parameter vector Θ over all β, θ) with β > 0, M β 1 and θ > 0. The expressions for Q u Θ; Θ k 1) ) and Q c Θ; Θ k 1) ) are given in 12) and 14), respectively. The maximization over the mixing weights β, requires the maximization of i U u z k) lnβ ) + i C c z k) lnβ ), which can be done analogously as in the uncensored case see addendum online). We implement the restriction M β 1 by setting β M 1 M 1 β. Setting the partial derivatives at β k) equal to zero implies that the optimizer satisfies i U β k) u z k) By the sum constraint we have + i C c z k) i U u z k) im + i C c z k) im β k) M β k) M for 1,..., M 1. i U u z k) im + i C c z k) im, n and the same form also follows for 1,..., M 1: i U β k) u z k) + i C c z k) n for 1,..., M. 16) The new estimate for the prior probability β in the truncated mixture is the average of the posterior probabilities of belonging to the th component in the mixture. The optimizer indeed corresponds to a maximum since the matrix of second order partial derivatives is negative definite matrix with a compound symmetry structure.

9 4 Choice of the shape parameters and of the number of Erlangs in the mixture 9 In order to maximize QΘ; Θ k 1) ) with respect to θ, we set the first order partial derivatives equal to zero see Appendix A). This leads to the following M-step equation for θ: with θ k) T k) i U x i + i C E X li i, u i, t l, t u ; θ k 1) )) /n T k), 17) M βk) r β k) t l ) r e tl /θ t u ) r e tu /θ θ r 1 r 1)! F t u ; r, θ) F t l ; r, θ)) θθ k). As in the uncensored case, the new estimate θ k) in 17) for the common scale parameter θ again has the interpretation of the sample mean divided by the average shape parameter in the mixture, but in the formula for the sample mean, we now take the expected value of the censored data points given the censoring times and subtract a correction term T k) due to the truncation. However, T k) in 17) depends on θ k) and has a complicated form. Therefore, it is not possible to find an analytical solution of 17). We circumvent this issue by evaluating T in the previous value of θ, i.e. θ k 1), and use 17) with T k 1) in the right-hand side as an approximation for finding the maximum in the kth EM step. The convergence of the EM algorithm to the maximum likelihood estimator can however not be theoretically guaranteed, but works in practice, based on our experiments. Lee and Scott 2012) use the same approach to fit multivariate Gaussian mixture models to censored and truncated data. The E- and M-steps are iterated until lθ k) ; X ) lθ k 1) ; X ) is sufficiently small. The maximum likelihood estimator of the original mixing weights α for 1,..., M can be retrieved by inverting expression 7). This is most easily done by first computing α β F t u ; r, θ) F t l ; r, θ) for 1,..., M, where β and θ denote the values in the final EM step, and then normalizing the weights such that they sum to 1. 4 Choice of the shape parameters and of the number of Erlangs in the mixture 4.1 Initialization We start by making an initial choice for the number of Erlangs M in the mixture and set the shapes equal to r for 1, 2,..., M. Extending Lee and Lin 2010) 2, we introduce a spread factor s by which we multiply the shapes in order to get a wider spread at the initial step, i.e. r s for 1, 2,..., M. The initialization of θ and α is based on the denseness of mixtures of Erlangs see Tms 1994, p. 163)), as explained in Section 3.2. The initialization procedure can be based either on the 2 We acknowledge the help of Simon Lee who suggested this approach in personal communication.

10 4 Choice of the shape parameters and of the number of Erlangs in the mixture 10 uncensored data or also censored data can be employed. The initial scale θ 0) is chosen such that θ 0) r M equals the maximum data point and the initial weights α for 1, 2,..., M are set to be the relative frequency of data points in the interval r 1 θ 0), r θ 0) ]. If this interval does not contain any data points for some, the initial weight corresponding to the Erlang in the mixture with shape r will be zero and consequently the weight α will remain zero at each subsequent iteration. This is clear from the updating scheme 16) in the M-step and the expressions 13) and 15) of the posterior probabilities in the E-step. The shapes r with initial weight α equal to zero are therefore removed from the mixture at the initial step. Numerical experiments show that the iterative scheme performs well and results in fast convergence using the above choice of initial estimates for θ and α. 4.2 Adusting the shapes Since the initial shape parameters are pre-fixed and hence not estimated, the fitted mixture might be sub-optimal. Adustment of the shape parameters is necessary. Ideally, for a given number of Erlangs M, we want to choose optimal values for the shapes. The choice of the shapes for a given M however is an optimization problem over N M which is impossible to solve. We have to resort to a practical procedure which explores the parameter space efficiently in order to obtain a satisfying choice for the shapes. After applying the EM algorithm once to obtain the maximum likelihood estimates corresponding to the initial choice of the shape parameters, we perform stepwise variations of the shapes, each time refitting the scale and the weights using the EM algorithm, and compare the log likelihoods of the results. We hereby follow the procedure proposed by Lee and Lin 2010): 1. Run the algorithm starting from the shapes {r 1,..., r M 1, r M + 1} with initial scale θ and weights {α 1,..., α M 1, α M } the final estimates of the previous execution of the EM algorithm. Repeat this step for as long as the log likelihood improves, each time replacing the old set of parameters by the new ones. This procedure is then applied on the M 1)th shape and so forth until all the shapes are treated. 2. Run the algorithm starting from the shapes {r 1 1, r 2,..., r M } with initial scale θ and weights {α 1, α 2,..., α M } the final estimates of the previous execution of the EM algorithm. Repeat this step for as long as the log likelihood improves, each time replacing the old set of parameters by the new ones. This procedure is then applied on the 2nd shape and so forth until all the shapes are treated. 3. Repeat the loops described in the previous steps until the log likelihood can no longer be increased. Using this algorithm we eventually reach a local maximum of the log likelihood, by which we mean that the fit can no longer be improved by either increasing or decreasing any of the r. 4.3 Reducing the number of Erlangs Too many Erlangs in the mixture will result in an issue of overfitting, which is always a problem in statistical modeling. A decision rule such as Akaike s information criterion AIC, Akaike, 1974) or Schwartz s Bayesian information criterion BIC, Schwarz, 1978) help to decide on the

11 5 Examples 11 value of M. Any other information criterion IC) or obective function could be optimized depending on the purpose for which the model is used. We use a backward stepwise search. As mixtures of Erlangs are dense in the space of continuous distributions, we start from a close-fitting mixture of M Erlangs resulting from the shape adustment procedure described in the Section 4.2 and compute the value of the IC. We next reduce the number of Erlangs M in the mixture by deleting the shape r with smallest weight α, refit the scale and weights using the EM algorithm and readust shapes using the same shape adustment procedure. If the resulting fit with M 1 Erlangs attains a lower value of the IC, the new parameter values replace the old ones. We continue reducing the number of Erlangs in the mixture until the value of the IC does no longer decrease by deleting an additional shape. A backward selection has the advantage of providing initial values close to the maximum likelihood estimates corresponding to the new set of shapes which greatly reduces the run time Lee and Lin 2010)). In contrast, by using a forward stepwise procedure it is not clear which additional shape parameter to use and how the parameters from the previous run can be used to provide useful information on parameter initialization. As a guideline, we recommend to start from an initial choice for the number of Erlangs M and a spread s resulting in a close-fitting or even overfitting of the data. 4.4 Compare the final fit using different starting values Since the log likelihood may have multiple local maxima due to the presence of the positive integer shapes, the quality of the initial estimate can greatly influence the final result and therefore it is wise to try and compare the results of several initial starting points. We compare the final fits, after the shape adustment procedure and reduction of the number of Erlangs using an IC, starting from different sets of initial values obtained by varying the spread factor s in the initial step. 5 Examples The usefulness of the proposed fitting procedure is demonstrated in four applications. A first example involves simulated data from a bimodal distrubution which we censor and truncate allowing us to compare the original density and the entire uncensored and untruncated sample to the fitted mixture of Erlangs. The second example illustrates the use of mixtures of Erlangs to represent the right-censored unemployment durations. The third example fits a mixture of Erlang distributions to truncated claim size data and demonstrates how the fitted mixture can be used to analytically price reinsurance contracts. In the fourth example, we illustrate the effectiveness of the proposed algorithm on a small interval-censored dataset representing the time to cosmetic deterioration. The developed R program used and the datasets discussed are available online The R code contains the procedures discussed in section 4. An illustration is provided for the first example we consider.

12 5 Examples Simulated data We generate a random sample of 5000 observations from the bimodal mixture of gamma distributions with density function given by f u x) 0.4fx; r 5, θ 0.5) + 0.6fx; r 10, θ 1). 18) Next we truncate the data by reecting all observations beneath the 5% sample quantile or above the 95% sample quantile. The remaining 4500 data points are subsequently being right censored by generation 4500 observations from another mixture of gamma distributions with density function f rc x) 0.4fx; r 5, θ 2/3) + 0.6fx; r 9, θ 1.25). The resulting data set is composed of 2579 uncensored and 1921 right censored data points, and is used to calibrate the Erlang mixture, keeping the lower and upper truncation into account. Using the automatic search from Section 4.4 we start from M 10 Erlangs in the mixture and let the spread factor s used in the initial step range from 1 to 10. AIC is used to decide upon how many Erlangs to use in the mixture and the right censored data points are treated as being observed at the initial step. The lowest AIC value was obtained with a mixture of 4 Erlangs. The final parameter estimates are in Table 1. This optimal choice of shapes was reached using spread factor s 3. Table 1: Parameter estimates of the fitted mixture of 4 Erlangs to the censored and truncated data generated starting from density 18). r α θ A graphical comparison of the fitted distribution and the original data generated can be found in Figure 1. We compare the fitted mixture of Erlangs density to the true density 18) and a histogram of all 5000 generated data points, before truncation and censoring. The fitted mixture of Erlangs density almost perfectly coincides with the underlying true density Fitted Density True Density Observed Relative Frequency Figure 1: Graphical comparison of the density of the fitted mixture of 4 Erlangs, the true underlying density 18) and the histogram of the generated data before censoring and truncation.

13 5 Examples Unemployment duration We examine the economic data from the January Current Population Survey s Displaced Workers Supplements DWS) for the years 1986, 1988, 1990, and 1992 which was first analyzed in McCall 1996). A thorough discussion of this dataset is available in Cameron and Trivedi 2005). The variable under consideration is unemployment duration spell) or more accurately oblessness duration, measured in two-week intervals. All other covariates in the dataset are ignored in the analysis. Following Cameron and Trivedi 2005), a spell is considered complete if the person is re-employed at a full-time ob CENSOR1) and right-censored otherwise. This results in 1073 uncensored data points and 2270 right censored data points. The parameter estimates of the Erlang mixture, obtained by using the automatic search procedure starting from M 10 Erlangs in the mixture with spread factor s used in the initial step ranging from 1 to 10, are given in Table 2. AIC is again used to decide upon the number of Erlangs in the mixture and the right censored data points are treated as being observed at initialization. The lowest AIC value was obtained with a mixture of 4 Erlangs. This optimal choice of shapes was reached using spread factor s 3. Table 2: Parameter estimates of the fitted mixture of 4 Erlangs to the right-censored unemployment data. r α θ Figure 2 compares the Kaplan-Meier survival curve Kaplan and Meier 1958)), along with 95% confidence bounds, to the survival function of the estimated Erlang mixture. The fitted survival function provides a smooth fit of the data, closely resembling the non-parametric estimate. Survival Time Figure 2: Graphical comparison of the survival function of the fitted mixture of 4 Erlangs and the Kaplan-Meier estimator with 95% confidence bounds for the right-censored unemployment data.

14 5 Examples Secura Belgian Re insurance data The Secura Belgian Re dataset discussed in Beirlant et al. 2004) contains 371 automobile claims from 1988 until 2001 gathered from several European insurance companies. The data are uncensored, but left truncated at 1, 200, 000 since a claim is only reported to the reinsurer if the claim size is at least as large as 1,200,000 Euro. The sizes of the claims are corrected among others for inflation. Based on these observations, the reinsurer wants to calibrate a model in order to price reinsurance contracts. The search procedure using AIC prefers a mixture of only two Erlangs with shapes 5 en 15. The parameter estimates of this best-fitting mixture are shown in Table 3. In Figure 3 left) we compare the histogram of the truncated data to the fitted truncated density. Figure 3 right) illustrates that the truncated survival function of the mixture of two Erlangs perfectly coincides with the Kaplan-Meier estimate. Table 3: Parameter estimates of the fitted mixture of 2 Erlangs to the left-truncated claim sizes in the Secura Belgian Re dataset. r α θ e+00 2e 07 4e 07 6e 07 Fitted Truncated Density Observed Relative Frequency Survival e+06 4e+06 6e+06 8e+06 0e+00 2e+06 4e+06 6e+06 8e+06 Claim size Figure 3: Graphical comparison of the truncated density of the fitted mixture of 2 Erlangs and the histogram of the left-truncated claim sizes left) and of the truncated survival function and the Kaplan-Meier estimator with 95% confidence bounds right) for the Secura Belgian Re dataset. Following Beirlant et al. 2004, p. 188), we use the calibrated Erlang mixture to price an excessof-loss XL) reinsurance contract, where the reinsurer pays for the claim amount in excess of a given limit. The net premium ΠR) of such a contract with retention level R > 1, 200, 000 is given by the ΠR) EX R) + X > 1, 200, 000) where X denotes the claim size and ) + max, 0). In case X follows a mixture of M Erlang distributions, where we assume without loss of generality r i i for i 1,..., M, the net

15 5 Examples 15 premium is ΠR) θe R/θ 1 F 1, 200, 000; α, r, θ) θ 2 1 F 1, 200, 000; α, r, θ) M 1 n0 n1 M 1 kn M 1 kn 1 ) R/θ) n A k n! A k ) fr; n, θ), 19) with A k M k+1 α for k 0,..., M 1. The derivation of this property can be reconstructed using Willmot and Woo 2007) or Klugman et al., 2013, p. 21). In Table 4, we compare the nonparametric, Hill and Generalized Pareto GP) based estimates of ΠR) for the Secura Belgian Re dataset from Table 6.1 in Beirlant et al. 2004, p. 191) to the estimates obtained using formula 19). The maximum claim size observed in the dataset equals 7, 898, 639 which is the only data point on which the non-parametric estimate of the net premium with retention level R 7, 500, 000 is based. The non-parametric estimate corresponding to retention level R 10, 000, 000 is hence zero. The fitted Erlang mixture allows us to estimate the net premium using intrinsically all data points, but postulates a lighter tail compared to the Pareto-type alternatives since Erlang mixtures have an asymptotically exponential tail Neuts 1981, p. 62)). Whereas the estimates based on the extreme value methodology are higher than the non-parametric ones, the estimates based on the Erlang mixture are lower. At the high-end of the sample range, the estimators differ strongly, as implied by the different tail behavior of the three approaches. The reinsurance actuary should carefully investigate the right tail behavior of the data in order to choose his approach. Table 4: Non-parametric, Hill, GP and Mixed Erlang-based estimates for ΠR). R Non-Parametric Hill GP Mixed Erlang 3,000, , , , , ,500, , , , , ,000,000 74, , , , ,500,000 53, , , , ,000,000 35, , , , ,500,000 1, , , , ,000, , , Besides modeling the tail of the claim size distribution above a certain threshold, Beirlant et al. 2004, p. 198) also estimate a global statistical model to describe the whole range of all possible claim outcomes for the Secura Belgian Re dataset. This is needed when trying to estimate ΠR) for values of R smaller than the threshold above which the extreme value distribution is fit. Based on the mean excess function, the authors propose the use of a mixture of an exponential and a Pareto distribution. Instead of having to use this body-tail approach explicitly, the implemented shape adustment and reduction techniques when fitting the Erlang mixture has guided us to a mixture with two components of which the first one represents the body of the distribution and the second represents the tail. The fitting procedure for Erlang mixtures is able to make this choice implicitly in a data driven way, leading to close representation of the data. In Table 5 we compare the estimated net premiums from Table 6.2 in Beirlant et al. 2004, p. 198) obtained using the Exp-Par model to the non-parametric and Mixed Erlang estimates.

16 5 Examples 16 Table 5: Non-parametric, Hill, GP and Mixed Erlang-based estimates for ΠR). R Non-Parametric Exp-Par Mixed Erlang 1,250, , , , ,500, , , , ,750, , , , ,000, , , , ,250, , , , ,500, , , ,619.0 Note that when R 1, 200, 000, the net premium equals the mean excess loss EX R X > R), which is called the mean residual lifetime in survival context. Klugman et al., 2013, p. 20) show that the distribution of the excess loss or residual lifetime is again a mixture of M Erlangs with the same scale θ and different weights which we can compute analytically: M α n0 α n+fr; n + 1, θ) M 1 n0 A for 1,..., M. nfr; n + 1, θ) 5.4 Time to cosmetic deterioration In order to illustrate the fitting procedure of Erlang mixtures in case of interval-censoring, we consider a small dataset from a retrospective study to compare the cosmetic effects of radiotherapy alone versus radiotherapy and aduvant chemotherapy on women with early breast cancer. The 94 subects in the study are monitored over time with visits every 4-6 months but, as their recovery progressed, the interval between visits lengthened. This dataset is discussed among others in Finkelstein and Wolfe 1985) and Klein and Moeschberger 2003) and is available in the packages MLEcens and KMsurv in R. We examine the time to the first appearance of moderate or severe breast retraction, a cosmetic deterioration of the breast. As the patients were observed only at some random times, the exact times of breast retraction are known only to fall within the interval between visits 56 interval-censored observations) or after the last time the patient was seen 38 right-censored observations). In this illustration, we ignore the treatment indicator. AIC prefers a mixture of 3 Erlangs of which the estimated parameters are shown in Table 6. A non-parametric estimator of the survival function for interval-censored data is given by Turnbull 1976). The estimator is a modified version of the Kaplan-Meier estimator based on an iterative procedure and has no closed form. Figure 4 compares the survival function of the fitted Erlang mixture to Turnbull s estimate for the censored time to cosmetic deterioration. The fitted survival function of the Erlang mixtures lies within the 95% confidence bounds at each time point and provides a smoother fit of the data compared to the non-parametric alternative. Table 6: Parameter estimates of the fitted mixture of 2 Erlangs to the censored time to cosmetic deterioration. r α θ

17 6 Discussion 17 Survival Time Figure 4: Graphical comparison of the survival function of the fitted mixture of 3 Erlangs and the Turnbull estimator of the survival function with 95% confidence bounds of the censored time to cosmetic deterioration. 6 Discussion We extend the Lee and Lin 2010) EM algorithm for fitting mixtures of Erlangs with a common scale parameter to censored and truncated data. The EM algorithm able to deal with censored and truncated data remains a simple iterative algorithm. The initialization of the parameters can be done in a similar way as in Lee and Lin 2010) based on the denseness property Tms, 1994, p. 163) and provides close starting values making the algorithm converge fast. The shape adustment procedure explores the parameter space in a clever way such that, when adusting and reducing the shapes, the previous estimates for the scale and the weights provide a very close approximation to the maximum likelihood estimates corresponding to the new set of shapes, which greatly reduces the run time. Extending Lee and Lin 2010), we suggest the use of a spread factor to achieve a wider spread for the shapes at the initial step. We recommend comparing the resulting fits starting from different initial values obtained by varying the spread factor and changing the initial number of Erlangs. We implement the fitting procedures in R and show how mixtures of Erlangs can be used to adequately represent any univariate distribution in a wide variety of applications where data is allowed to be censored and truncated. The examples on several simulated and real datasets illustrate the effectiveness of our proposed algorithm and demonstrate the approximation strength of mixtures of Erlangs. Future research may explore incorporating regressor variables in the mixture of Erlangs with common scale and introducing the flexibility of this approach in a regression context. Adusting the EM algorithm tailored to the class of multivariate mixtures of Erlangs, introduced by Lee and Lin 2009), to the case of censored and truncated data is another appealing extension. Acknowledgements The authors are grateful to Simon Lee and Gerda Claeskens for valuable comments and suggestions. Andrei Badescu and Sheldon Lin gratefully acknowledge the financial support received from the Natural Sciences and Engineering Research Council of Canada NSERC). Katrien Antonio acknowledges financial support from NWO through a Veni 2009 grant. Katrien Antonio and Roel Verbelen acknowledge financial support from a FWO research proect.

18 A Partial derivative of Q 18 Appendix A Partial derivative of Q In order to maximize QΘ; Θ k 1) ) with respect to θ, we set the first order partial derivative at θ k) equal to zero QΘ; Θ k 1) ) θ i U i C θθ k) u z k) c z k) x i θ 2 r [ θ θ F t u ; r, θ) F t l ; r, θ) ] ) F t u ; r, θ) F t l ; r, θ) E Xi Zim 1, l i, u i, t l, t u ; θ k 1) ) θ 2 r θ [ θ F t u ; r, θ) F t l ; r, θ) ] ) F t u ; r, θ) F t l ; r, θ) 3) 1 u θ 2 z k) x i + 1 θ 2 n θ i U i U i C i U i U u z k) u z k) c z k) θ i C + i C c z k) n θθ k) ) c z k) E r γr, t u /θ) γr, t l /θ) ) r 1)! F t u ; r, θ) F t l ; r, θ)) θ γr, t u /θ) γr, t l /θ) ) r 1)! F t u ; r, θ) F t l ; r, θ)) 16) 1 θ 2 x i + 1 ) li θ 2 E X i, u i, t l, t u ; θ k 1) n θ i U i C i U i C ) Z X i 1, l i, u i, t l, t u ; θ k 1) θθ k) β k) r u z k) t l /θ 2 t l /θ ) r 1 e t l /θ t u /θ 2 t u /θ) r 1 e tu /θ r 1)! F t u ; r, θ) F t l ; r, θ)) c z k) t l /θ 2 t l /θ ) r 1 e t l /θ t u /θ 2 t u /θ) r 1 e tu /θ r 1)! F t u ; r, θ) F t l ; r, θ)) 1 θ 2 x i + 1 ) li θ 2 E X i, u i, t l, t u ; θ k 1) n θ n θ 2 M β k) i C t l ) r e tl /θ t u ) r e tu /θ θ r 1 r 1)! F t u ; r, θ) F t l ; r, θ)) β k) r θθ k) θθ k) 0, where we used expression 3) of the cumulative distribution of an Erlang.

19 References 19 References Akaike, H. 1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 196): Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., De Waal, D., and Ferro, C. 2004). Statistics of Extremes: Theory and Applications. Wiley Series in Probability and Statistics. Wiley. Cameron, A. and Trivedi, P. 2005). Microeconometrics: Methods and applications. Cambridge University Press. Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B Methodological), pages Finkelstein, D. M. and Wolfe, R. A. 1985). A semiparametric model for regression analysis of interval-censored failure time data. Biometrics, 414):pp Kaplan, E. L. and Meier, P. 1958). Nonparametric estimation from incomplete observations. Journal of the American statistical association, 53282): Klein, J. and Moeschberger, M. 2003). Survival Analysis: Techniques for censored and truncated data. Statistics for Biology and Health. Springer, second edition. Klugman, S. A., Paner, H. H., and Willmot, G. E. 2013). Loss models: Further topics. John Wiley & Sons. Lee, G. and Scott, C. 2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis, 569): Lee, S. C. and Lin, X. S. 2009). Modeling dependent risks with multivariate Erlang mixtures. ASTIN Bulletin, 421): Lee, S. C. and Lin, X. S. 2010). Modeling and evaluating insurance losses via mixtures of Erlang distributions. North American Actuarial Journal, 141):107. McCall, B. P. 1996). Unemployment insurance rules, oblessness, and part-time work. Econometrica, 643): McLachlan, G. and Jones, P. 1988). Fitting mixture models to grouped and truncated data via the em algorithm. Biometrics, pages McLachlan, G. and Peel, D. 2001). Finite mixture models. Wiley. McLachlan, G. J. and Krishnan, T. 2007). The EM algorithm and extensions, volume 382. Wiley-Interscience. Neuts, M. F. 1981). Matrix-geometric solutions in stochastic models: an algorithmic approach. The John Hopkins University Press. Schwarz, G. 1978). Estimating the dimension of a model. The Annals of Statistics, 62): Tms, H. C. 1994). Stochastic models: an algorithmic approach. Wiley.

20 References 20 Turnbull, B. W. 1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society. Series B Methodological), pages Willmot, G. E. and Lin, X. S. 2011). Risk modelling with the mixed Erlang distribution. Applied Stochastic Models in Business and Industry, 271):2 16. Willmot, G. E. and Woo, J.-K. 2007). On the class of erlang mixtures with risk theoretic applications. North American Actuarial Journal, 112):

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Michael P. Babington and Javier Cano-Urbina August 31, 2018 Abstract Duration data obtained from a given stock of individuals

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS

FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS The 7 th International Days of Statistics and Economics, Prague, September 9-, 03 FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS Ivana Malá Abstract In the contribution the finite mixtures of distributions

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT

APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT In this paper the Generalized Burr-Gamma (GBG) distribution is considered to

More information

GB2 Regression with Insurance Claim Severities

GB2 Regression with Insurance Claim Severities GB2 Regression with Insurance Claim Severities Mitchell Wills, University of New South Wales Emiliano A. Valdez, University of New South Wales Edward W. (Jed) Frees, University of Wisconsin - Madison UNSW

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Clustering on Unobserved Data using Mixture of Gaussians

Clustering on Unobserved Data using Mixture of Gaussians Clustering on Unobserved Data using Mixture of Gaussians Lu Ye Minas E. Spetsakis Technical Report CS-2003-08 Oct. 6 2003 Department of Computer Science 4700 eele Street orth York, Ontario M3J 1P3 Canada

More information

Nonparametric Model Construction

Nonparametric Model Construction Nonparametric Model Construction Chapters 4 and 12 Stat 477 - Loss Models Chapters 4 and 12 (Stat 477) Nonparametric Model Construction Brian Hartman - BYU 1 / 28 Types of data Types of data For non-life

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS INSTITUTE AND FACULTY OF ACTUARIES Curriculum 09 SPECIMEN SOLUTIONS Subject CSA Risk Modelling and Survival Analysis Institute and Faculty of Actuaries Sample path A continuous time, discrete state process

More information

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform

More information

STAT 6350 Analysis of Lifetime Data. Probability Plotting

STAT 6350 Analysis of Lifetime Data. Probability Plotting STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Toward a unified approach to fitting loss models

Toward a unified approach to fitting loss models Toward a unified approach to fitting loss models Jacques Rioux and Stuart Klugman February 6, 2004 Abstract There are four components to fitting models selecting a set of candidate distributions, estimating

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Simulation-based robust IV inference for lifetime data

Simulation-based robust IV inference for lifetime data Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia 1 Myra Yazbeck 2 David Wensley 3 1 Department of Economics Carleton University 2 Department of Economics

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

Hidden Markov models for time series of counts with excess zeros

Hidden Markov models for time series of counts with excess zeros Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Exam C Solutions Spring 2005

Exam C Solutions Spring 2005 Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Method of Moments. which we usually denote by X or sometimes by X n to emphasize that there are n observations.

Method of Moments. which we usually denote by X or sometimes by X n to emphasize that there are n observations. Method of Moments Definition. If {X 1,..., X n } is a sample from a population, then the empirical k-th moment of this sample is defined to be X k 1 + + Xk n n Example. For a sample {X 1, X, X 3 } the

More information

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Anna Kiriliouk 1 Holger Rootzén 2 Johan Segers 1 Jennifer L. Wadsworth 3 1 Université catholique de Louvain (BE) 2 Chalmers

More information

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model Quentin Guibert Univ Lyon, Université Claude Bernard Lyon 1, ISFA, Laboratoire SAF EA2429, F-69366, Lyon,

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS Statistica Sinica 20 2010, 1257-1272 A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS Tony Siu Tung Wong and Wai Keung Li The University of Hong Kong Abstract: We

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions

Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions Statistics Preprints Statistics 10-2014 Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions Yimeng Xie Virginia Tech Yili Hong Virginia Tech Luis A. Escobar Louisiana

More information

Linear Regression Models

Linear Regression Models Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Course 4 Solutions November 2001 Exams

Course 4 Solutions November 2001 Exams Course 4 Solutions November 001 Exams November, 001 Society of Actuaries Question #1 From the Yule-Walker equations: ρ φ + ρφ 1 1 1. 1 1+ ρ ρφ φ Substituting the given quantities yields: 0.53 φ + 0.53φ

More information

On Backtesting Risk Measurement Models

On Backtesting Risk Measurement Models On Backtesting Risk Measurement Models Hideatsu Tsukahara Department of Economics, Seijo University e-mail address: tsukahar@seijo.ac.jp 1 Introduction In general, the purpose of backtesting is twofold:

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

A Class of Weighted Weibull Distributions and Its Properties. Mervat Mahdy Ramadan [a],*

A Class of Weighted Weibull Distributions and Its Properties. Mervat Mahdy Ramadan [a],* Studies in Mathematical Sciences Vol. 6, No. 1, 2013, pp. [35 45] DOI: 10.3968/j.sms.1923845220130601.1065 ISSN 1923-8444 [Print] ISSN 1923-8452 [Online] www.cscanada.net www.cscanada.org A Class of Weighted

More information

MONOTONICITY OF RATIOS INVOLVING INCOMPLETE GAMMA FUNCTIONS WITH ACTUARIAL APPLICATIONS

MONOTONICITY OF RATIOS INVOLVING INCOMPLETE GAMMA FUNCTIONS WITH ACTUARIAL APPLICATIONS MONOTONICITY OF RATIOS INVOLVING INCOMPLETE GAMMA FUNCTIONS WITH ACTUARIAL APPLICATIONS EDWARD FURMAN Department of Mathematics and Statistics York University Toronto, Ontario M3J 1P3, Canada EMail: efurman@mathstat.yorku.ca

More information

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017 Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;

More information

SPRING 2007 EXAM C SOLUTIONS

SPRING 2007 EXAM C SOLUTIONS SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =

More information

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Standard Error of Technical Cost Incorporating Parameter Uncertainty Standard Error of Technical Cost Incorporating Parameter Uncertainty Christopher Morton Insurance Australia Group This presentation has been prepared for the Actuaries Institute 2012 General Insurance

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Counts using Jitters joint work with Peng Shi, Northern Illinois University of Claim Longitudinal of Claim joint work with Peng Shi, Northern Illinois University UConn Actuarial Science Seminar 2 December 2011 Department of Mathematics University of Connecticut Storrs, Connecticut,

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution Daniel Alai Zinoviy Landsman Centre of Excellence in Population Ageing Research (CEPAR) School of Mathematics, Statistics

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions C. Xing, R. Caspeele, L. Taerwe Ghent University, Department

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Research Statement. Zhongwen Liang

Research Statement. Zhongwen Liang Research Statement Zhongwen Liang My research is concentrated on theoretical and empirical econometrics, with the focus of developing statistical methods and tools to do the quantitative analysis of empirical

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

Solutions to the Spring 2015 CAS Exam ST

Solutions to the Spring 2015 CAS Exam ST Solutions to the Spring 2015 CAS Exam ST (updated to include the CAS Final Answer Key of July 15) There were 25 questions in total, of equal value, on this 2.5 hour exam. There was a 10 minute reading

More information

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme Applied Mathematical Sciences, Vol. 12, 2018, no. 18, 879-891 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.8691 Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively

More information

ASTIN Colloquium 1-4 October 2012, Mexico City

ASTIN Colloquium 1-4 October 2012, Mexico City ASTIN Colloquium 1-4 October 2012, Mexico City Modelling and calibration for non-life underwriting risk: from empirical data to risk capital evaluation Gian Paolo Clemente Dept. Mathematics, Finance Mathematics

More information

Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution

Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution ABSTRCT Composite distributions have well-known applications in the insurance industry. In this article a composite Exponential-Pareto

More information

Multivariate Normal-Laplace Distribution and Processes

Multivariate Normal-Laplace Distribution and Processes CHAPTER 4 Multivariate Normal-Laplace Distribution and Processes The normal-laplace distribution, which results from the convolution of independent normal and Laplace random variables is introduced by

More information

Optimal Reinsurance Strategy with Bivariate Pareto Risks

Optimal Reinsurance Strategy with Bivariate Pareto Risks University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 2014 Optimal Reinsurance Strategy with Bivariate Pareto Risks Evelyn Susanne Gaus University of Wisconsin-Milwaukee Follow

More information

A Comparison: Some Approximations for the Aggregate Claims Distribution

A Comparison: Some Approximations for the Aggregate Claims Distribution A Comparison: ome Approximations for the Aggregate Claims Distribution K Ranee Thiagarajah ABTRACT everal approximations for the distribution of aggregate claims have been proposed in the literature. In

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

A THREE-PARAMETER WEIGHTED LINDLEY DISTRIBUTION AND ITS APPLICATIONS TO MODEL SURVIVAL TIME

A THREE-PARAMETER WEIGHTED LINDLEY DISTRIBUTION AND ITS APPLICATIONS TO MODEL SURVIVAL TIME STATISTICS IN TRANSITION new series, June 07 Vol. 8, No., pp. 9 30, DOI: 0.307/stattrans-06-07 A THREE-PARAMETER WEIGHTED LINDLEY DISTRIBUTION AND ITS APPLICATIONS TO MODEL SURVIVAL TIME Rama Shanker,

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Indirect estimation of a simultaneous limited dependent variable model for patient costs and outcome

Indirect estimation of a simultaneous limited dependent variable model for patient costs and outcome Indirect estimation of a simultaneous limited dependent variable model for patient costs and outcome Per Hjertstrand Research Institute of Industrial Economics (IFN), Stockholm, Sweden Per.Hjertstrand@ifn.se

More information

Expectation Maximization and Mixtures of Gaussians

Expectation Maximization and Mixtures of Gaussians Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation

More information

Accounting for Missing Values in Score- Driven Time-Varying Parameter Models

Accounting for Missing Values in Score- Driven Time-Varying Parameter Models TI 2016-067/IV Tinbergen Institute Discussion Paper Accounting for Missing Values in Score- Driven Time-Varying Parameter Models André Lucas Anne Opschoor Julia Schaumburg Faculty of Economics and Business

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Correlation: Copulas and Conditioning

Correlation: Copulas and Conditioning Correlation: Copulas and Conditioning This note reviews two methods of simulating correlated variates: copula methods and conditional distributions, and the relationships between them. Particular emphasis

More information

Polynomial approximation of mutivariate aggregate claim amounts distribution

Polynomial approximation of mutivariate aggregate claim amounts distribution Polynomial approximation of mutivariate aggregate claim amounts distribution Applications to reinsurance P.O. Goffard Axa France - Mathematics Institute of Marseille I2M Aix-Marseille University 19 th

More information

ON THE TRUNCATED COMPOSITE WEIBULL-PARETO MODEL

ON THE TRUNCATED COMPOSITE WEIBULL-PARETO MODEL ON THE TRUNCATED COMPOSITE WEIBULL-PARETO MODEL SANDRA TEODORESCU and EUGENIA PANAITESCU The composite Weibull-Pareto model 2] was introduced as an alternative to the composite Lognormal-Pareto ] used

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

Comonotonicity and Maximal Stop-Loss Premiums

Comonotonicity and Maximal Stop-Loss Premiums Comonotonicity and Maximal Stop-Loss Premiums Jan Dhaene Shaun Wang Virginia Young Marc J. Goovaerts November 8, 1999 Abstract In this paper, we investigate the relationship between comonotonicity and

More information

A Regression Model for the Copula Graphic Estimator

A Regression Model for the Copula Graphic Estimator Discussion Papers in Economics Discussion Paper No. 11/04 A Regression Model for the Copula Graphic Estimator S.M.S. Lo and R.A. Wilke April 2011 2011 DP 11/04 A Regression Model for the Copula Graphic

More information

Monotonicity and Aging Properties of Random Sums

Monotonicity and Aging Properties of Random Sums Monotonicity and Aging Properties of Random Sums Jun Cai and Gordon E. Willmot Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario Canada N2L 3G1 E-mail: jcai@uwaterloo.ca,

More information

Likelihood ratio testing for zero variance components in linear mixed models

Likelihood ratio testing for zero variance components in linear mixed models Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University,

More information