Modified slash Birnbaum-Saunders distribution

Similar documents
Multivariate Birnbaum-Saunders Distribution Based on Multivariate Skew Normal Distribution

An extension of the slash-elliptical distribution

p-birnbaum SAUNDERS DISTRIBUTION: APPLICATIONS TO RELIABILITY AND ELECTRONIC BANKING HABITS

Slashed Exponentiated Rayleigh Distribution

A CHARACTERIZATION OF THE GENERALIZED BIRNBAUM SAUNDERS DISTRIBUTION

An Extended Weighted Exponential Distribution

THE TRANSMUTED BIRNBAUM SAUNDERS DISTRIBUTION

On some mixture models based on the Birnbaum-Saunders distribution and associated inference

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS

On Bivariate Birnbaum-Saunders Distribution

A New Two Sample Type-II Progressive Censoring Scheme

On the Hazard Function of Birnbaum Saunders Distribution and Associated Inference

Some Theoretical Properties and Parameter Estimation for the Two-Sided Length Biased Inverse Gaussian Distribution

Bivariate Weibull-power series class of distributions

Bivariate Sinh-Normal Distribution and A Related Model

Stat 710: Mathematical Statistics Lecture 12

An Extension of the Generalized Exponential Distribution

Different methods of estimation for generalized inverse Lindley distribution

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

STAT 6350 Analysis of Lifetime Data. Probability Plotting

PROPERTIES AND DATA MODELLING APPLICATIONS OF THE KUMARASWAMY GENERALIZED MARSHALL-OLKIN-G FAMILY OF DISTRIBUTIONS

Estimating the parameters of hidden binomial trials by the EM algorithm

A REPARAMETERIZED BIRNBAUM SAUNDERS DISTRIBUTION AND ITS MOMENTS, ESTIMATION AND APPLICATIONS

Statistical Estimation

Modelling location, scale and shape parameters of the birnbaumsaunders generalized t distribution

Introduction to Maximum Likelihood Estimation

On Sarhan-Balakrishnan Bivariate Distribution

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

The Log-generalized inverse Weibull Regression Model

Stat 5101 Lecture Notes

Estimation of Parameters of the Weibull Distribution Based on Progressively Censored Data

Step-Stress Models and Associated Inference

Inference for P(Y<X) in Exponentiated Gumbel Distribution

arxiv: v3 [stat.co] 22 Jun 2017

A MULTIVARIATE BIRNBAUM-SAUNDERS DISTRIBUTION BASED ON THE MULTIVARIATE SKEW NORMAL DISTRIBUTION

The formal relationship between analytic and bootstrap approaches to parametric inference

Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications

OPTIMAL B-ROBUST ESTIMATORS FOR THE PARAMETERS OF THE GENERALIZED HALF-NORMAL DISTRIBUTION

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Optimum Test Plan for 3-Step, Step-Stress Accelerated Life Tests

Estimation in an Exponentiated Half Logistic Distribution under Progressively Type-II Censoring

Biostat 2065 Analysis of Incomplete Data

The Inverse Weibull Inverse Exponential. Distribution with Application

Review and continuation from last week Properties of MLEs

Two Weighted Distributions Generated by Exponential Distribution

A Quasi Gamma Distribution

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Basic Computations in Statistical Inference

GARCH Models Estimation and Inference

INFERENCE FOR BIRNBAUM-SAUNDERS, LAPLACE AND SOME RELATED DISTRIBUTIONS UNDER CENSORED DATA

ADJUSTED PROFILE LIKELIHOODS FOR THE WEIBULL SHAPE PARAMETER

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Multistate Modeling and Applications

THE MODIFIED EXPONENTIAL DISTRIBUTION WITH APPLICATIONS. Department of Statistics, Malayer University, Iran

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

arxiv: v4 [stat.co] 18 Mar 2018

A Note on the Log-Alpha-Skew-Normal Model with Geochemical Applications.

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

Mathematical statistics

THE WEIBULL GENERALIZED FLEXIBLE WEIBULL EXTENSION DISTRIBUTION

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Multivariate Normal-Laplace Distribution and Processes

Robust Parameter Estimation in the Weibull and the Birnbaum-Saunders Distribution

A short introduction to INLA and R-INLA

Optimum Hybrid Censoring Scheme using Cost Function Approach

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

Econometrics I, Estimation

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

On Five Parameter Beta Lomax Distribution

Computational Statistics and Data Analysis

Birnbaum-Saunders Distribution: A Review of Models, Analysis and Applications

Financial Econometrics and Volatility Models Copulas

The Gamma Modified Weibull Distribution

Alpha-Skew-Logistic Distribution

The Log-Beta Generalized Half-Normal Regression Model

The Geometric Inverse Burr Distribution: Model, Properties and Simulation

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Measuring nuisance parameter effects in Bayesian inference

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

On Weighted Exponential Distribution and its Length Biased Version

The Gamma Generalized Pareto Distribution with Applications in Survival Analysis

Estimation of Quantiles

HANDBOOK OF APPLICABLE MATHEMATICS

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Some New Results on Information Properties of Mixture Distributions

The Kumaraswamy-G Poisson Family of Distributions

Modified maximum likelihood estimation of parameters in the log-logistic distribution under progressive Type II censored data with binomial removals

Parameter Estimation

A TWO-STAGE GROUP SAMPLING PLAN BASED ON TRUNCATED LIFE TESTS FOR A EXPONENTIATED FRÉCHET DISTRIBUTION

Math 494: Mathematical Statistics

Bivariate Degradation Modeling Based on Gamma Process

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Discriminating Between the Bivariate Generalized Exponential and Bivariate Weibull Distributions

Journal of Multivariate Analysis. Use of prior information in the consistent estimation of regression coefficients in measurement error models

Transcription:

Modified slash Birnbaum-Saunders distribution Jimmy Reyes, Filidor Vilca, Diego I. Gallardo and Héctor W. Gómez Abstract In this paper, we introduce an extension for the Birnbaum-Saunders (BS) distribution based on the modified slash (MS) distribution proposed by [12]. This new family of BS type distributions is obtained by replacing the usual normal distribution with the quotient of two independent random variables, one being a normal distribution in the numerator and the power of a exponential of parameter equal to two at the denominator. The resulting distribution is an extension of the BS distribution that has greater kurtosis values than the usual BS distribution and the slash Birnbaum-Saunders (SBS) distribution (see [7]). Moments and some properties are derived for the new distribution. Also, we draw inferences by the method of moments and maximum likelihood. A real data application is presented where the model fitting is implemented by using maximum likelihood estimation producing better results than the classic BS model and slash BS model. Keywords: Birnbaum-Saunders distribution, Modified slash distribution, Moments; Kurtosis, EM-algorithm. 2 AMS Classification: 6E5 1. Introduction The Birnbaum Saunders (BS) distribution (see [3] and [4]) was derived to model the material fatigue failure time process, and has been widely applied in reliability and fatigue studies. Extensive work has been done on the BS model with regard to its properties, inferences and applications. Generalizations of the BS distribution have been proposed by many authors; see for example, [6], [15], [7] and [2] among others, which allow obtaining a high degree of flexibility for this distribution. From an analytical point of view, the attention for the BS distribution is due to its attractive properties and its relationship with the normal distribution. Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta, Chile, Email: jimmy.reyes@uantof.cl Departamento de Estatística, IMECC, Universidade Estadual de Campinas, Campinas, Brasil, Email: fily@ime.unicamp.br Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta, Chile, Email: diego.gallardo@uantof.cl Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta, Chile, Email: hector.gomez@uantof.cl Corresponding Author.

2 A random variable T follows a Birnbaum-Saunders distribution with shape parameter α > and scale parameter β >, usually denoted by T BS(α, β). The random variable T can be represented as ( ) α (α ) 2 2 (1.1) T = β 2 Z + 2 Z + 1 where Z N(, 1). The resulting probability density function (pdf) of T is (1.2) f T (t; α, β) = ϕ(a t (α, β)) t 3 2 (t + β), t >, 2αβ 1 2 where ϕ( ) is the pdf of the N(, 1) distribution and a t (α, β) = ( t/β β/t)/α. The BS distribution proposed by [3] may not be suitable to represent data containing outlying observations due to its close dependence on the normal distribution. One way to overcome this problem, is to exploit the relationship between the BS and normal distributions to obtain a generalization of BS distributions based on the modified slash (MS) distribution in place of the normal one, which possesses heavier tails than the normal. The MS distribution was proposed recently by [12], which incorporates more kurtosis. This distribution is a type scale mixture of the normal distribution, in which the mixing distribution depends on the exponential distribution. The resulting distribution, called the MS distribution, has heavier tails than the normal and slash-normal distributions. Next, we present a brief review of the MS distribution and some of its properties. A random variable X has a MS distribution (see [12]) if it can be represented by (1.3) X = Z V 1 q where Z N(, 1) is independent of V exp(2). The pdf of X can be expressed as (1.4) f X (x) = 2q, v q e 2vq ϕ(xv) dv, x R. An interesting special case is obtained for q = 1, which is the canonic modified slash distribution, the distribution of the ratio of two independent random variables, one being the standard normal and the other an exp(2) distribution whose pdf can be written in a simple way as follows [ ] 2 x 2 1 2π 2ex 2 x Φ( 2 x ) if x, (1.5) f X (x) = (8π) 1 2 if x =, where Φ( ) denotes the cdf of the standard normal distribution. On the other hand, when q tends to, the resulting distribution is a normal one. The aim of this paper is to provide a generalized Birnbaum-Saunders distribution based on modified slash distribution. This extension is based mainly on the work of [12] and [7]. The resulting distribution, which we call modified slash Birnbaum-Saunders (MSBS) distribution, provides flexible thick-tailed distributions which can be used for robust estimation of parameters.

3 The rest of this paper is organized as follows.in Section 2, we define the BS distribution based on MS distributions and discuss some of its properties. Specifically, we derive its probability density function, and moments. In Section 3, we discuss estimation methods based on the modified moment estimators and the maximum likelihood estimation. In Section 4, we provide an illustrative example that displays the usefulness of the generalized BS distributions for fitting a real data set that has been analyzed before in the literature. Finally, some concluding remarks are made in Section 5. 2. Proposed distribution Analogous to the BS distribution of [2] and [7], we can define an extension of the Birnbaum-Saunders distribution based on the MS distribution by considering the following stochastic representation of T : ( ) α (α ) 2 2 (2.1) T = β 2 X + 2 X + 1 where X MS(, 1, q) given in (1.3). So, the random variable T is said to have a modified slash Birnbaum-Saunders (MSBS) distribution with parameters α, β and q, and will be denoted by T MSBS(α, β, q). The stochastic representation in (2.1) is useful for the simulation of data and also for implementing the EMalgorithm for ML estimation of parameters in MSBS models, as will be discussed later. 2.1. Density function. In this part we derive the pdf for the modified slash Birnbaum-Saunders distribution, which is based on the modified slash distribution 2.1. Proposition. Let T MSBS(α, β, q). Then, the pdf of T is (2.2) f T (t) = t 3/2 (t + β)q αβ 1/2 where ϕ( ) is the pdf of the N(, 1) distribution. v q e 2vq ϕ(v a t (α, β)) dv, t >, Proof. Let be X MS(, 1, q) with its pdf given in (1.4). The result comes easily from the stochastic representation in (2.1) and using the change of variable theorem. The next special case q = 1 is derived easily from Proposition 2.1. In fact, the pdf of T, given in (2.2), reduces to the following expression t 3/2 (t+β) ve 2v ϕ(va αβ 1/2 t (α, β)) dv, if t β, (2.3) f T (t) = (8π) 1/2 1 αβ, if t = β. On the other hand, when q tends to, we simply obtain the ordinary BS distribution, as will be seen in the next proposition. Figure 1 shows the pdf of the MSBS, SBS and BS distributions for different values of α and β, with q = 2.

4 Density..2.4.6.8 1. 1.2 MSBS(.5,1,2) SBS(.5,1,2) BS(.5,1) 1 2 3 4 t Figure 1. Plots of densities of the MSBS (black line), SBS (red line) and BS (blue line) distributions, for α =.5, β = 1 and q = 2 2.2. Properties of the MSBS distribution. 2.2. Proposition. Let T MSBS(α, β, q). Then, a) at MSBS(α, aβ, q) for all a > ; b) T 1 MSBS(α, β 1, q); c) lim q f T (t; α, β, q) = f T (t; α, β), where f T (.; α, β) is the pdf of a BS(α, β) distribution; d) The conditional distribution of T, given V = v, is the BS distribution. That is, T (V = v) BS ( α v,q, β ), where α v,q = α/v 1/q. Proof. Parts (a) and (b) are directly obtained using the change of variable theorem. Part (c) is obtained in the same way since the MS distribution tends to the normal distribution. Finally, Part (d) is obtained easily from (1.3) and (2.1). As is required in the study and development of any statistical analysis, the moments are important, especially in applied work. Some of the most important features and characteristics of a distribution can be studied through moments such as skewness and kurtosis parameters, which depend on the gamma function, Γ( ), defined for {x R : x, 1, 2,...}, which in turn are likely to be important in a number of applications. The following results are related to them.

5 2.3. Proposition. Let T MSBS(α, β, q) and B 2k;q = 2 2k q q Γ( q 2k q ), q > 2k, k = 1,..., r. If E[X 2r ] exists and is finite. Then, (2.4) (2.5) E[T r ] = β r r E[T r ] = y= ( 2r 2y 1 β 2r E[T r ], ) y s= (2(r+s y))! ( y s ) (α 2 ) 2(r+s y) E [ X 2(r+s y)] and where E[X 2(r+s y) ] = 2 2 (r+s y) (r+s y)! B 2(r+s y);q, y =, 1,..., r. [ [ ] ( ) ] T Proof. From (2.1), we can express E r α ( 2r β = E r 2 X + α 2 X) 2 + 1. So, by expanding the binomial expression in the expectation above, we have [ ] T r 2r ( ) [ ((α 2r ) ) ] 2 (k/2) (α ) 2r k E β r = E k 2 X + 1 2 X k= r ( ) y ( ) 2r y (α ) 2(r+s y) [ = E X 2(r+s y)], 2y s 2 y= s= (2(r+s y))! where X MS(, 1, q) and E[X 2(r+s y) ] = 2 2 (r+s y) (r+s y)! B 2(r+s y);q under as restriction defined; see [12]. Finally, since β 2 T 1 and T have the same distribution, E[β 2r T r ] = E[T r ]. Following [12], we can note that E[T r ] exists if and only if E[X 2r ] exists (finite moments). Thus, for instance E[T ] exists if and only if E[X 2 ] exists as well as Var(T ) exists if and only if E[X 4 ] exists. In these cases, E[X 2 ] = 2B 2;q if q > 2 and E[X 4 ] = 6B 4;q if q > 4. Even thought that B 2;q is defined for q 2, we can not affirm that E[T ] exists. The following results can be derived directly from Proposition 2.3 for 2.4. Corollary. Let T MSBS(α, β, q). Then, the mean and variance are respectively (2.6) (2.7) E[T ] = β ( 1 + α 2 B 2;q ), q > 2, Var(T ) = α 2 β 2{ 3α 2 B 4;q α 2 B 2 2;q + 2B 2;q }, q > 4. 2.5. Corollary. Let T MSBS(α, β, q) and µ r = E[T r ], r = 1,..., 4. Then, µ 1 = β ( 1 + α 2 B 2;q ), q > 2, µ 2 = β 2( 3α 4 B 4;q + 4α 2 B 2;q + 1 ), q > 4, µ 3 = β 3( 15α 6 B 6;q + 18α 4 B 4;q + 9α 2 B 2;q + 1 ), q > 6, µ 4 = β 4( 15α 8 B 8;q + 12α 6 B 6;q + 6α 4 B 4;q + 16α 2 B 2;q + 1 ), q > 8. 2.6. Corollary. The symmetry and kurtosis coefficients of T MSBS(α, β, q) are respectively given by;

6 β1 = µ 3 3µ 1 µ 2 + 2µ 3 1 (µ 2 µ 2 1 )3/2, q > 6 β 2 = µ 4 4µ 1 µ 3 + 6µ 2 1µ 2 3µ 4 1 (µ 2 µ 2 1 )2, q > 8. The following proposition shows that the MSBS distributions can be represented as a particular type of scale-mixture of the BS and the exp(2) distribution. 2.7. Proposition. Let V exp(2). Then, a) If T (V = v) BS(αv 1/q, β), then T MSBS(α, β, q); b) If U = V 2/q, then T (U = u) BS(αu 1/2, β) and the pdf of T can be expressed as f T (t) = t 3/2 (t + β)q 2αβ 1/2 u q+1 2 1 e 2uq/2 ϕ ( u a t (α, β) ) du, t >, where ϕ( ) is the pdf of the N(, 1) distribution. Proof. The results are obtained by using properties of the conditional distribution. In Part (a), it is easy to show that f T (t) = f T V (t v)h V (v) dv = t 3/2 (t + β)q αβ 1/2 v q e 2vq ϕ(v a t (α, β)) dv, t >. The result in Part (b) is obtained in the same way as Part (a). But, here it is important to note that U has a Weibull distribution. That is, U = V 2/q W eibull ( q/2, 2 (2/q)), and its pdf is h(u) = qu q/2 1 exp{ 2u q/2 }, u >. 3. Estimation In this section, we discuss estimation methods based on the modified moment (MM) estimators and the maximum likelihood (ML) estimators for the unknown model parameters based on a random sample T 1,..., T n from T MSBS(α, β, q). 3.1. Method of moments. Following the ideas of [8] and [1], we consider the modified moment estimation method. Next, we present the modified moment estimators for the parameters of α, β and q, which are based on E(T ), E(T 1 ) and E(T 2 ) and their statistics S = 1 n n T i, R = 1 n n T 1 i and W = 1 n n T i 2. The modified moment estimators are obtained by solving the following equations: S = E(T ), R = E(T 1 ) and W = E(T 2 ), that are valid for q > 4. The modified moment estimator for β is given by S (3.1) β M = R. On the other hand, the modified moment estimators for q, denoted by q M, is obtained as a solution of the equation [ W = S ( ) 2 B4, qm 3 SR 1 R B2, q 2 + 4 ] (3.2) SR 3. M

7 Using the modified moments estimator of q, q M modified moment estimator of α given by (3.3) α M = (SR) 1/2 1 B 2, qm. found in (3.2), we obtain the Thus, the modified moment estimators of α, β and q are α M, βm and q M, respectively. If q M is a consistent estimator, then α M and β M are also consistent estimators. As is not very clear to see that q M is consistent, we propose an alternative method to obtain a consistent estimator of q. Namely, by using the equations S = E(T ) and R = E(T 1 ), which are valid for q > 2, we have the following solutions for α and β, for q fixed (3.4) β M = S R and α (SR) M (q) = 1/2 1. B 2;q For q > 4, these estimators are consistent. A consistent estimator of q can be obtained by maximizing the generalized profile log-likelihood function l p (q) = l ( α M (q), β M (q), q) that depends just on q, where l( ) is the genuine log-likelihood function. The estimator (3.5) q M = argmax q l ( α M (q), β M, q) is consistent according to [14]. Thus, the resulting modified moment estimators of α, β and q are α M = α M ( q M ), β M and q M, respectively, that require just q > 2 for being computed. The modified moment estimators can be used effectively as initial values in the iterative procedure for computing the ML estimates. 3.2. Maximum likelihood estimators. Let T 1,..., T n be a random sample of size n from T MSBS(α, β, q). The log-likelihood function for θ = (α, β, q) can be expressed as l(θ) = 3 2 n log t i + n log(t i + β) + n log(q/α) n n 2 log β + log G(t i ) where G(t i ) = v q e 2vq ϕ ( a ti (α, β)v ) dv. The first derivatives of the log-likelihood function are l α = n W α (t i ) n α, l β = n W β (t i ) n n 2β + 1 t i + β and l q = n W q (t i ) + n q

8 where W η (t i ) = G η(t i ) G(t i ), with G η(t i ) = G(t i) η, η = α, β, q. The partial derivatives of G(t i ) can be written as G α (t i ) = 1 α G β (t i ) = G q (t i ) = v q+2 a 2 t i (α, β) exp { 2v q }ϕ ( a ti (α, β)v ) dv, ) β v (αa q+2 2 t i (α, β) + 2a ti (α, β) exp{ 2v q }ϕ ( a ti (α, β)v ) dv, v q ln v(1 2v q ) exp{ 2v q }ϕ(a ti (α, β)v) dv. The second derivatives of the log-likelihood function are reported in the Appendix. 3.3. ML Estimation using an EM-Algorithm. The EM-algorithm is a wellknown tool for ML estimation when unobserved (or missing) data or latent variables are present while modeling. This algorithm enables the computationally efficient determination of the ML estimates when iterative procedures are required. Looking at the stochastic representation of a modified slash distribution in (1.3), we noted that the scale factor V 1/q depend on the parameter q, so we are going to consider a re-parametrization in order to get the EM-algorithm in the MSBS model. Let U = V 2/q be the new mixing random variable. Then, the resulting stochastic representation for T can be expressed as ( α ( T = β 2 X + α ) 2 X) 2 2, (3.6) + 1 where X = U 1/2 Z, with Z N(, 1) independent of U W eibull ( q/2, 2 (2/q)), that has pdf that can be expressed as h(u) = qu q/2 1 exp{ 2u q/2 }, u >. Under the new parametrization, we have the conditional distribution of T, given U = u, follows the BS(α/ u, β) distribution. Consequently, the pdf of the T reduces to f T (t) = t 3/2 (t + β)q (3.7) 2αβ 1/2 where ϕ( ) is the pdf of N(, 1) distribution. t i u q+1 2 1 e 2uq/2 ϕ ( u a t (α, β) ) du, t >, Let T 1,..., T n be a random sample of size n of T MSBS(α, β, q). Here, the parameter vector is θ = (α, β, q), with θ Θ R 3 +. Let l c (θ t c ) and Q(θ θ) = E[l c (θ t c ) t, θ] denote the complete-data log-likelihood function and its expected value, respectively. Each iteration of the EM algorithm involves two steps. Note that, by combining Proposition 2.2 and the results in (3.6), the above setup can be represented through of a hierarchical representation given by (3.8) (3.9) T i (U i = u i ) U i ind BS ( α/ u i, β ), ind W eibull(q/2, 2 (2/q) ), i = 1,..., n. Let t = [t 1,..., t n ] and u = [u 1,..., u n ] be observed and unobserved data, respectively. The complete data t c = [t, u ] corresponds to the original data t augmented with u. We now detail the implementation of the ML estimation of parameters of MSBS distributions by using the EM-algorithm. In this part, the

9 hierarchical representation given in (3.8) and (3.9) is useful to obtain the complete log-likelihood function associated with t c, which can be expressed as l c (θ t c ) n log(α) n 2 log(β) 1 n [ ti 2α 2 u i β + β ] n 2 + log (t i + β) t i + l c (q t c ), where l c (q t c ) = n log(q) + (q/2 1) n log(u i) 2 n uq/2 i. Letting û i = E[U i t i, θ = θ], it follows that the conditional expectation of the complete log-likelihood function has the form n n Q(θ θ) n log(α) n 2 log(β) 1 2α 2 + Q(q θ), [ ti û i β + β ] 2 + t i log (t i + β) where Q(q θ) = n log(q)+(q/2 1)S 1n 2S 2n,q, with S 1n = n E[log(U i) t i ] and S 2n,q = n E[uq/2 i t i ]. As both quantities S 1n and S 2n,q have no explicit forms in the context of our model, and they have to be computed numerically. Thus to compute Q(q θ) we use a approach similar to that from [9]. Specifically, let {u r ; r = 1,..., R} be a sample randomly drawn from the conditional distribution U (T = t, θ = θ), so the quantity Q(q θ) can be approximated as follows: Q(q θ) 1 R R l c (q u r ). We then have the EM-algorithm for the ML estimation of the parameters of the MSBS distributions as follows: r=1 E-step. Given θ = θ (k) = ( α (k), β (k), q (k) ), compute û i (k), for i = 1,..., n. CM-step I: Update α (k) by maximizing Q(θ θ (k) ) over α, which leads to the expression: α 2(k+1) = S(k) (k) u β + β (k) R u (k) 2u (k), CM-step II: Obtain β (k+1) as the solution of β 2(k+1) β [ ] [ ] (k+1) K( β (k+1) ) + 2 u (k) R u (k) + R u u (k) K( β (k+1) ) + S u (k) =. CM-step III: Fix α = α (k+1) and β = β (k+1), update q (k) by optimizing where u (k) = 1 n q (k+1) = arg max q Q( α (k+1), β (k+1), q θ (k) ). n û (k) i, S u (k) = 1 n n û (k) i t i, and R (k) u = 1 1 n n ( ), û(k) i t i

1 with K(x) = { 1 n ( 1 ) } 1 n x+t i. The iterations are repeated until a suitable convergence rule is satisfied, say l( θ (k+1) ) l( θ (k) ) sufficiently small. Useful starting values required to implement this algorithm are those obtained under the normality assumption or by using the modified moment estimates α M, β M and q M. 3.1. Remark. 1) Note that when q tends to, the estimates of α and β in M-step reduce to those when the BS distribution is used; 2) Note that CM-Steps II requires an one-dimensional search for the root of β, respectively, which can be easily achieved by using the uniroot function built in R. On the other hand CM-Step III can be very slow. An alternative is to use the idea in [11], and it can be defined as: CML-step: Update q (k) by optimizing the following constrained actual loglikelihood function q (k+1) = arg max q l ( α (k+1), β (k+1), q). The corresponding standard errors (s.e.) are calculated from the observed information matrix, whose required derivatives are presented in Appendix. 4. Numerical applications 4.1. Simulated data. By using the representation given in (2.1), it is possible to generate random numbers for the MSBS(α, β, q) distribution, which leads to the following algorithm: Step 1 : Generate Z from N(, 1); Step 2 : Generate V from exp(2); Step 3 : Compute X = Z ; V q ( 1 ) α ( 2 Step 4 : Compute T = β 2 X + α 2 X) 2 + 1. So, it follows that if X MS(, 1, q), then T has the required MSBS(α, β, q) distribution. The main object is to study the behavior of the maximum likelihood estimates for parameters of the model. Table 1 shows results of simulations studies, illustrating the behavior of the MLEs for 1 generated samples of sizes 5 and 1 from a population distributed as MSBS(α, β, q), considering many cases for α, β and q parameters. For each generated sample, MLEs were computed numerically using the EM algorithm explained in Section 3.3. Means and standard errors (s.e.) are reported in Table 1. Note that the estimated bias for the all cases are acceptable and it can be observed that the bias becomes smaller as the sample size n increases, as one would expect.

11 Table 1. Empirical means and standard deviations for 1 replicates. q = 1 q = 2 q = 3 n = 5 n = 1 n = 5 n = 1 n = 5 n = 1 Parameter mean s.e. mean s.e. mean s.e. mean s.e. mean s.e. mean s.e. α =.3 α.363.11.355.69.327.79.323.52.315.65.31.45 β =.8 β.81.85.83.59.81.63.82.46.8.54.8.39 q 1.133.257 1.13.151 2.437 1.27 2.23.487 3.878 2.178 3.354 1.5 α =.3 α.357.113.352.69.331.78.325.55.315.66.31.46 β = 1.2 β 1.29.131 1.25.86 1.2.96 1.23.69 1.23.82 1.22.59 q 1.133.293 1.93.157 2.469 1.135 2.273.541 3.866 2.26 3.315.938 α =.4 α.453.144.451.95.437.17.431.7.417.88.414.62 β =.8 β.82.118.82.78.84.91.88.62.85.72.81.5 q 1.149.288 1.91.15 2.454 1.272 2.251.528 3.875 2.572 3.335.999 α =.4 α.435.143.43.97.433.11.431.74.418.88.415.61 β = 1.2 β 1.25.178 1.24.117 1.26.13 1.25.88 1.21.11 1.27.78 q 1.13.31 1.17.162 2.421.995 2.251.54 3.861 2.158 3.381 1.69

12 4.2. Real data. We illustrated the model with two data set collected by Department of Mines of the University of Atacama, Chile, representing neodymium and zinc measured in 86 samples of minerals. 4.2.1. Neodymium data set. A descriptive summary of the analyzed data sets is reported in Table 2, where b 1 and b 2 are sample skewness and kurtosis coefficients, respectively. Figure 2 shows the solution of the moment equation for q. Table 2. Summary statistics for data set n t S t b1 b 2 86 35.2 34.237 3.648 18.216 Here, one can note that there is one and only solution to the moment equation in the interval (2, ). 4 2 2 4 5 5 1 15 2 2 3 4 5 6 q 4. 4.5 5. 5.5 6. q Figure 2. Moment equation for q: the solution for q is obtained by the intersection of the both curves. Using the results presented in Section 3.1, see (3.4) and (3.5). We computed the modified moment estimates leading to the following values: α M =.537, β M = 27.211 and q M = 2.827, which were used as initial estimates for the maximum likelihood approach. For this data set, we used the Akaike information criterion (AIC, see [1]). Additionally, we present the statistics for two goodness of fit test: Cramer-Von-Mises (W ) and Anderson Darling (A); see [5] for details. In both test, a lower statistic suggest a model with a better fit. Based on the these statistics, the associated p-values, and AIC criteria, we conclude that the MSBS model provides the best fit to the Neodymium data set over the SBS and BS models, as can be seen in Table 3. Figure 3 shows the histogram for the data set and the adjusted distributions for the BS, SBS and MSBS models. The MSBS model provides a better fit than other fitting models. Finally, Figure 4 presents

13 qqplots for the three models, which also indicate good model fit for the MSBS model. Table 3. ML estimates and the s.e. values (in parentheses) estimated asymptotic standard errors. W and A values with their associated p-values (in parentheses) for three models Estimates BS SBS MSBS α.758 (.58).289 (.64).29 (.15) β 27.25 (2.71) 27.247 (1.592) 27.683 (5.983) q - 1.578 (.426) 2.9 (.57) AIC 757.191 612.48 61.44 W.43 (.71).1 (.586).86 (.659) A 2.178 (.74).564 (.682).488 (.759).35.3 BS SBS MSBS.12 BS SBS MSBS.25.1 Density.2.15.8.6.1.4.5.2.. 5 1 15 2 18 19 2 21 22 Concentration Concentration Figure 3. Histogram of the neodymium data set and estimated densities for the BS, SBS and MSBS models. 4.2.2. Zinc data set. A descriptive summary of the analyzed data sets is reported in Table 4. For this data set, the moment estimates of the parameters are given by α M =.958, β M = 52.31 and q M = 2.6466, and they can be used effectively as initial values in the iterative procedure for computing the ML estimates, for finding the ML estimates of the model parameters based on the EM-algorithm or Table 4. Summary statistics for data set n t S t b1 b 2 86 96.72 148.434 4.999 33.423

14 Sample Quantiles 5 1 15 2 25 3 Sample Quantiles 5 1 15 2 25 3 Sample Quantiles 5 1 15 2 25 3 5 15 25 (a)theoretical Quantiles 5 15 25 (b)theoretical Quantiles 5 15 25 (c)theoretical Quantiles Figure 4. qqplot: a) MSBS model, b) SBS model and c) BS model. by using the maximization procedure of Newton-Raphson. The results of fitting of BS, SBS and MSBS models are reported in Table 5, which shows the MSBS model provides a better fit than other models, base on W and A statistics and their associated p-values. Also, AIC selection criterion indicates that the MSBS model presents the best fit. The estimated density functions for the BS, SBS and Table 5. ML estimates, the SE values (in parentheses) are the estimated asymptotic standard errors. W and A values with their associated p-values (in parentheses) for three models Estimates BS SBS MSBS α 1.34 (.1).466 (.9).459 (.15) β 5.884 (5.87) 58.47 (5.282) 41.66 (5.983) q - 1.722 (.447) 2.144 (.57) AIC 973.558 816.496 812.652 W.631 (.19).194 (.28).179 (.312) A 3.531 (.15) 1.12 (.351).941 (.389) MSBS models and the estimated densities by ML estimates are presented in Figure 5, one case see the good performance of the MSBS model. 5. Discussion We introduced here an extension of Birnbaum-Saunders distributions based on MS distributions, proposed recently by [12]. This new class of distributions is quite flexible and more useful for modeling purposes than the SBS distributions proposed by [7]. We then pointed out some important characteristics and properties of this family of distributions. In particular, we discussed estimation methods based on the modified moment estimators, which can be used effectively as initial

15 cumulative distribution function 1..8.6.4.2. Empirical BS SBS MSBS cumulative distribution function Empirical BS SBS MSBS.98.985.99.995 1. 2 4 6 8 1 12 6 8 1 12 zinc measured zinc measured Figure 5. Empirical cdf with estimated MSBS cdf (blue line), estimated SBS cdf (red line) and estimated BS cdf (green line). values in the iterative procedure for computing the ML estimates, obtained using numerical procedures such as the Newton-Raphson procedure. Finally, we provide an illustration that displays the usefulness of the generalized BS distributions for fitting a real data set. The MSBS distributions based on MS distributions can be used along the same lines as the univariate BS distributions in the context of regression following the ideas of [13]. Acknowledgments We acknowledge two referees for comments and suggestions that substantially improved the presentation. The research of J. Reyes and H. W. Gómez were supported by Semillero UA-215 (Chile). The research of F. Vilca was supported by CNPq (Brasil). References [1] Akaike, H. A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716-723, 1974. [2] Balakrishnan, N., Leiva, V., Sanhueza, A. and Vilca, F. Estimation in the Birnbaum- Saunders distribution based on scale-mixture of normals and the EM-algorithm, Statistics and Operations Research Transactions, 33, 171-192, 29. [3] Birnbaum, Z.W. and Saunders, S.C. A New Family of Life Distributions, Journal of Applied Probability, 6, 319-327, 1969a. [4] Birnbaum, Z.W. and Saunders, S.C. Estimation for a family of life distributions with applications to fatigue, Journal of Applied Probability, 6, 328-347, 1969b. [5] Chen, G. and Balakrishnan, N. A general purpose approximate goodness-of-fit test, Journal of Quality Technology, 27, 154-161, 1995. [6] Díaz-García, J. A. and Leiva, V. A new family of life distributions based on elliptically contoured distributions, Journal of Statistical Planning and Inference, 128, 445-457, 25. [7] Gómez, H.W.; Olivares-Pacheco, J.F. and Bolfarine, H. An Extension of the Generalized Birnbaun-Saunders Distribution, Statistics and Probability Letters, 79, 331-338, 29.

16 [8] Ng, H.K.T., Kundu, D. and Balakrishnan, N. Modified moment estimation for the twoparameter Birnbaum-Saunders distribution, Comput. Statist. Data Anal., 43, 283-298, 23. [9] Lee, S.Y. and Xu, L. Influence analyses of nonlinear mixed-effects models, Comput. Statist. Data Anal., 45, 321-341, 24. [1] Leiva, V., Barros, M., Paula, G. A. and Sanhueza, A. Generalized Birnbaum-Saunders distribution applied to air pollutant concentration, Environmetrics, 19, 235-249, 28. [11] Lin, T.I., Lee, J.C. and Hsieh, W.J. Robust mixture modeling using the skew t distribution, Stat. Comput., 17, 81-92, 27. [12] Reyes, J., Gómez, H. W. and Bolfarine, H. Modified slash distribution, Statistics, 47, 929-941, 213. [13] Rieck, J.R. and Nedelman, J.R. A log-linear model for the Birnbaum-Saunders distribution, Technometrics, 33, 51-6, 1991. [14] Severini, T.A. Likelihood functions for inference in the presence of a nuisance parameter, Biometrika, 85, 57-522, 1998. [15] Vilca, F. and Leiva, V. A new fatigue life model based on family of skew-elliptical distributions, Communications in Statistics: Theory and Methods, 35, 229-244, 26. Appendix: Elements of Hessian matrix { } Let M i (v; α, β, q) = exp 2v q v2 2 a2 t (α, β). Then 2 l α 2 = n α 2 n 2 l β 2 = n 2β 2 + n 2 l q 2 = n q 2 n 2 l α β = n 2 l α q = n 2 l β q = n G αα (t i )G(t i ) {G α (t i )} 2 {G(t i )} 2 [(t i + β) 2 G ββ(t i )G(t i ) {G β (t i )} 2 ] G qq (t i )G(t i ) {G q (t i )} 2 {G(t i )} 2 G αβ G(t i ) G α (t i )G β (t i ) {G(t i )} 2 G αq G(t i ) G α (t i )G q (t i ) {G(t i )} 2 G βq G(t i ) G β (t i )G q (t i ) {G(t i )} 2, {G(t i )} 2

17 where G αα (t i ) = β2 2βt i + t 2 i v q+2 M α 6 β 2 t 2 i (v; α, β, q) [ v 2 (β 2 2βt i + t 2 i ) 3α 2 ] βt i dv i 1 G ββ (t i ) = v q+2 M 4α 4 β 4 t 2 i (v; α, β, q) [ v 2 (β 4 2β 2 t 2 i + t 4 i ) 4α 2 βt 3 ] i dv i G qq (t i ) = 1 G αβ (t i ) = 2α 5 β 3 t 2 i v q M i (v; α, β, q){log v} 2 ( 4v 2q 6v q + 1 ) dv v q+2 M i (v; α, β, q) [ 2α 2 βt i (β 2 t 2 i ) v 2 (β 4 2β 3 t i + 2βt 3 i t 4 i ) ] dv G αq(t i) = β2 2βt i + t 2 i v q+2 M i(v; α, β, q) log v (1 2v q ) dv α 3 βt i G βq (t i ) = t2 i β 2 v q+2 M α 2 β 2 i (v; α, β, q) log v (1 2v q ) dv. t i