arxiv: v1 [stat.me] 29 Jul 2017

Size: px

Start display at page:

Download "arxiv: v1 [stat.me] 29 Jul 2017"

Paul Stafford
5 years ago
Views:

1 Publshed n Statstca Neerlandca, 2016, vol. 70, no 4, p A Skew-Normal Copula-Drven GLMM Kalyan Das 1, Mohamad Elmasr 2 and Arusharka Sen 3 1 Unversty of Calcutta, 2 McGll Unversty and 3 Concorda Unversty arxv: v1 [stat.me] 29 Jul 2017 Abstract: Ths paper presents a method for fttng a copula-drven generalzed lnear mxed models. For added flexblty, the skew-normal copula s adopted for fttng. The correlaton matrx of the skew-normal copula s used to capture the dependence structure wthn unts, whle the fxed and random effects coeffcents are estmated through the mean of the copula. For estmaton, a Monte Carlo expectaton-maxmzaton algorthm s developed. Smulatons are shown alongsde a real data example from the Framngham Heart Study. Keywords: EM Algorthm, Gaussan Copula, Generalzed Lnear Mxed Models, Monte Carlo, Skew-Normal. 1 Introducton The key component drvng the development of lnear mxed models s the ablty of such models to handle data wth correlated observatons; a data structure where predctors and response varables are measured at more than one level. Such structure s common wth repeated observatons as n medcal studes, where patent characterstcs are measured at several tme ponts, not necessarly the same set for each patent. Fsher (1918) proposed the addton of a random effects term to the lnear model, whch ntroduced heteroscedastcty. As a result, the lnear mxed model takes the form Y = X β + D b + ɛ, = 1,..., m (1.1) where Y s an (n 1) vector of observed response varable for sample unt, = 1,..., m. X s an (n p) fxed effects desgn matrx wth coeffcent β of dmenson (p 1). D s an (n q) random effects desgn matrx wth coeffcent b of dmenson (q 1), and ɛ s an (n 1) vector of random errors. Inference from lnear mxed model becomes slghtly more tedous by the ntroducton of the random coeffcent b. Ths requres an dentfablty assumpton of ndependence between b and ɛ. A popular modelng assumpton s then b d N q (0, Ω b ), ɛ nd N n (0, ψ ), (1.2) where Ω = Ω(α) and ψ = ψ (γ) are assocated dsperson matrces that capture possble varablty among -and wthn- ndvduals, parametrzed by α and γ. In many lterature revews, the extra restrctveness assocated wth specfyng the dstrbuton functons of b and ɛ s deemed unnecessary. Thereupon, Arellano-Valle et al. (2005) proposed the use of skew-normal n leu of

2 2 K. Das, M. Elmasr and A. Sen the normal dstrbuton for both b and ɛ, n an attempt to capture any slght departures from normalty. Moreover, they have explctly characterzed the lkelhood functon of the resultng model, and ftted t by the constraned expectaton maxmzaton algorthm (CEM). Nevertheless, many researchers dscussed other technques and models for nference, for nstance the use of mxture of normals as n Verbeke and Lesaffre (1996), sem-parametrc models as n Zhang and Davdan (2001), non-parametrc or smoothed non-parametrc technque n maxmum lkelhood estmaton as n Newton and Zhang (1999) and predctve recurson algorthm as n Tao et al. (1999). Ths paper follows the Arellano-Valle et al. (2005) approach by modellng the dependence structure n herarchcal multvarate dstrbutons va a copula-drven generalzed lnear mxed model. Gven response varables Y j, = 1,... n, j = 1,..., n, we assume that Y = (Y 1,..., Y n ) follows an n -varate dstrbuton wth a predefned mean and covarance matrx. We model such dstrbuton by usng an n -varate skewnormal copula SN n (.), where the random effects are ntegrated n the mean structure of the copula. We chose the covarance matrx Σ = Σ(ξ, t ) to be of an autoregressve structure n order to nclude the tme-varant parameters. Formally, Y b F n (η(x β + D b ), Σ(ξ, t )) (1.3) where X, β, b, D as defned n (1.1) and (1.2), ξ s the dsperson autoregressve tme-varant parameter wth respect to t = (t 1,..., t n ), and η(.) s a lnk functon. F k (η, Σ) s a k-varate dstrbuton functon wth mean η and covarance Σ. Moreover, we assume the margnal denstes Y j b are a functon of {x j, t j, D, b, β} va the same lnk functon η. The rest of ths paper s organzed as follows. Secton 2 ntroduces a specfc characterzaton of the skew-normal dstrbuton and the copula used n ths paper. Secton 3 ntroduces the model, and constructs the lkelhood usng a skew-normal copula wthn a GLM framework. Secton 4 dscusses the use of numercal Monte Carlo EM algorthm to estmate parameters. Secton 5 llustrates smulaton results under dfferent models. Secton 6, a real data analyss s performed to llustrate the applcaton of our study. Secton 7 ends wth a general dscusson. 2 Skew-normal dstrbuton and copula For a better understandng, we begn ths secton wth the defnton of the multvarate skew-normal dstrbuton consdered through ths paper. Defnton 2.1. An n-dmensonal random vector X R n follows a skewnormal dstrbuton wth locaton vector µ R n, dsperson matrx Σ (a n n postve defnte matrx) and a skewness vector λ R n, f ts densty functon

3 3 K. Das, M. Elmasr and A. Sen s gven by sn n (x µ, Σ, λ) = 2φ n (x µ, Σ)Φ 1 (λ Σ 1/2 (x µ)), x R n. (2.1) In the unvarate case sn 1 (x µ, σ 2, λ) = 2φ 1 (x µ, σ 2 )Φ 1 (λ x µ ), (2.2) σ ( < x, µ < ), µ, σ R, 0 < σ <. Here φ n (. µ, Σ) and Φ n (. µ, Σ) denote respectvely an n-varate densty and dstrbuton functon of a normal random varable wth mean vector µ and covarance matrx Σ (σ 2 n the unvarate case). Ths notaton s used throughout ths paper. A specal case s when λ = 0, whch reduces the skew-normal to the normal dstrbuton. The skew-normal characterzaton n (2.1) s attrbuted to Arellano-Valle and Genton (2005), and the one n (2.2) s attrbuted to Azzaln (1985) and expanded further by Azzaln and Dalle-Valle (1996). Many authors have proposed dfferent forms. However, for convenence, a varaton of the characterzaton n (2.1) s the only one used n ths paper. Azzaln and Dalle-Valle (1996) proposed a smplfed parametrzaton of λ, n (2.1), n terms of an arbtrary n n postve defnte matrx, as λ = 1/2 δ 1 δ 1 δ, (2.3) where δ 1 δ < 1 for some δ R n. Ths characterzaton s used later to defne the lkelhood functon. 2.1 Skew-normal copula A prncpal part of constructng the copula s defnng the margnal dstrbuton of Y j b. In (1.3), denote the margnal dstrbuton and densty functon of Y j b by F (y j θ j ) and f(y j θ j ), where θ = (θ 1,..., θ n ) are the parameters of nterest. For the same notatons n (1.3), condtonally on b defne where the jth margnal s Z = (Z 1,..., Z n ) Skew-N n (D b, Σ, λ ), Z j Skew-N 1 ((D b ) j, 1, λ j), (2.4) where λ j s the unvarate skewness parameter, whch s not equvalent to the components of the skewness vector λ = (λ 1,..., λ n ), rather t s derved usng a lnear transformaton of the multvarate response varable, see Chapter

4 4 K. Das, M. Elmasr and A. Sen 5 of Azzaln (2013) for a detaled revew. Note that (D b ) j s the jth element of the vector D b and Σ = Σ(ξ, t ) s a correlaton matrx, whch has all ts dagonal elements equal to 1. Snce the random number F (Y j θ j ) unform(0,1), we lnk the two margnal dstrbutons of Z j and Y j n a way that for each observaton y j we have and z = (z 1,..., z n ) = z j = SN 1 1 [F (y j θ j ) (D b ) j, 1, λ j], (2.5) ( ) SN 1 1 [F (y 1 θ 1 ) 1 ],..., SN 1 1 [F (y n θ n ) n ], where SN k s a k-varate skew-normal dstrbuton functon. For presentaton smplcty, j = {(D b ) j, 1, λ j } n the above equaton. By the transformaton n (2.5), we attempt to estmate the jont dstrbuton of Y b usng a copula as The correspondng densty s then F n (y θ ) = SN n (z D b, Σ, λ ). (2.6) n f(y j θ j ) f n (y θ ) = sn n (z D b, Σ, λ ) sn 1 (z j (D b ) j, 1, λ (2.7) j ). See Landsman (2009) for a good reference on skew ellptcal copulas and Lambert and Vandenhende (2002) for copula-based longtudnal models. j=1 3 Log-lkelhood functon Despte the defned copula n (2.6) and (2.7), wrtng down the complete loglkelhood functon s stll dffcult. The skew-normal densty n (2.1) s defned partally by the normal dstrbuton functon, noted as Φ. Therefore, we frst show that the skew-normal copula n (2.1) could be smplfed by condtonng on latent random varable wth a half-normal dstrbuton. By Proposton 1 and Corollary 1 of Arellano-Valle et al. (2005), based on a characterzaton due to Henze (1986), we can rewrte the skew-normal dstrbuton of Z as follows. d Z = D b + Σ 1/2 δ v + Σ 1/2 (I δ δ ) 1/2 X where = d meanng dstrbuted as, v HN 1 (0, 1)(HN = half-normal), X N n (0, I), b N q (0, Ω b ) are ndependent and δ = λ. 1 + λ λ

5 5 K. Das, M. Elmasr and A. Sen In other words, Z v, b N n (D b + Σ 1/2 δ v, Σ 1/2 (I δ δ Smlarly n the unvarate case, v HN 1 (0, 1), b N q (0, Ω b ). )Σ 1/2 ), (3.1) Z j v, b N 1 ((D b ) j + δ j v, 1 δ 2 j) (3.2) where, v HN 1 (0, 1), b N q (0, Ω b ), δ j = λ j 1 + λ j 2. The above reparametrzaton facltates n defnng posteror dstrbuton of b z, v as gven by the followng proposton. Proposton 3.1. Gven the settngs n (3.1), the condtonal densty functon of b z, v s specfed by where b z, v N q (τ 2 D Ψ 1 (z Σ 1/2 δ v ), τ 2 ) (3.3) τ 2 Moreover, = (Ω 1 b + D Ψ 1 D ) 1, Ψ = Σ 1/2 (I δ δ T b z Skew-N q ( τ 2 D Ψ 1 z, τ 2 + d d, λ b ) )Σ 1/2. (3.4) where Σ 1/2 d = τ 2 D Ψ 1 Σ 1/2 δ, λ b = (D Ψ 1 δ ) (τ 2 + d d ) 1/ d (τ 2) 1 d Note that λ b n (3.4) s completely specfed, therefore, t does not ncrease the dmenson of estmable vector of parameters. The proof of Proposton 3.1 s essentally based on Bayes Theorem where f z v = f z b,v f b db = Φ n (Σ 1/2 δ v, Ψ + D Ω b D ). (3.5) Under general regularty condtons and by (2.7), the complete condtonal log-lkelhood s m l(θ y, x, b) = l (θ y, x, b ), (3.6) =1

6 6 K. Das, M. Elmasr and A. Sen where by the herarchcal representaton n (3.2) and (3.1) l (θ y, x, b ) 1 2 log Ψ 1 2 (z D b Σ 1/2 δ v ) Ψ 1 (z D b Σ 1/2 δ v ) 1 2 n j=1 log(1 δ 2 j) 1 2 n + log f(y j θ j ), j=1 n j=1 (z j (D b ) j δ j v ) 2 (1 δ 2 j ) (3.7) Gven that y = (y 1,..., y m ), x = (x 1,..., x m ), b = (b 1,..., b m ), and Ψ denotes the determnant of Ψ. 3.1 Autoregressve correlaton matrx To characterze the covarance matrx n a plausble manner, one needs to take n to account dfferent sources of random varaton wthn observatons. Under the multple observatons per unt settngs, these sources generally fall nto three categores: measurement error, random effect, and seral correlaton. The frst source s controlled durng the fttng process. The random effect source of varaton s accounted for wthn the model as a random ntercept b. Therefore, we would only consder ntegratng the seral correlaton source of varaton, and as noted earler the covarance matrx Σ presented n (1.3) s modeled as a functon of tme and a dsperson varable ξ. Assumng a homogeneous varance wthn unts, (σ 2 ), the correlatons amongst each unt observatons (Y ) are determned by the autocorrelaton functon ρ (.) as Cov(Y j, Y k ) = σ 2 ρ ( t j t k ). (3.8) The smplest form to express the seral correlaton above s to assume an explct dependence of the current observaton Y j on prevous observatons Y (j 1),..., Y 1, whch could be modeled usng n-th order autoregressve model. For example, consderng a frst order autoregressve model as y j = α y (j 1) + ɛ j, ɛ j d N(0, ζ). (3.9) Note that t would be dffcult to gve an explct nterpretaton of the α parameter f the measurements are not equally spaced n tme or when tmes of measurements are not common to all unts. One way of solvng ths ssue s to mplement an exponental autocorrelaton functon ρ(.), where Cov(Y j, Y k ) = σ 2 e ξ t j t k. (3.10)

7 7 K. Das, M. Elmasr and A. Sen The correlaton between two response varables s then Corr(Y j, Y k ) = e ξ t j t k. (3.11) Ths correlaton structure s used to construct the correlaton coeffcent matrx Σ = Σ (ξ, t ) n the copula structure and lkelhood. 4 Monte Carlo based EM algorthm The expectaton-maxmzaton (EM) algorthm (Dempster et al. (1977)) s an teratve approach for obtanng the maxmum lkelhood estmates. It conssts of two steps from whch the name s derved; an expectaton (E-step) and a maxmzaton step (M-step). Typcally the lkelhood of nterest nvolved a set of observed data x and unobserved latent data u, where the condtonal dstrbuton of u gven x s known. At teraton r, the E-step computes the expectaton of the log-lkelhood functon wth respect to the condtonal dstrbuton u x, θ (r). The M-step computes a new set of (provsonal) parameter estmates θ (r+1) that maxmze the expectaton of the earler E-step. Those two steps alternate to fnd a set of parameters that maxmze the lkelhood functon. Let l(θ u, x) be the log-lkelhood, then, for r = 1, 2... the alternatng steps are as follows: E-step: compute Q(θ θ (r) ) = E u x,θ (r)[l(θ u, x)]; M-step: fnd θ (r+1) = arg max θ Q(θ θ (r) ). Under certan regularty condtons dscussed n Wu (1983), the log-lkelhood functon converges to a local or global maxmum. The earlest detaled explanaton and namng of the EM algorthm was publshed by Dempster et al. (1977), where they generalzed earler attempts by Sundberg (1974), and sketched a convergence analyss for a wder class of problems. Meng and Rubn (1993) studes computatonal dffcultes encountered n the M-step, where they proposed smaller maxmzaton steps over the parameter space. They argued that nstead of maxmzng the whole set of parameters one can maxmze n a sequental manner a subset of parameters ndependently, whle the other subset s held fxed. Such modfcaton s called a constraned maxmzaton step (CM). A second mportant advancement to the EM algorthm was proposed by We and Tanner (1990), and s called the Monte Carlo (MC) EM algorthm. By applyng the law of large numbers on the E-step above, one can approxmate Q(θ x, θ (r) ) as Q(θ θ (r) ) = R 1 where R s relatvely a large sample sze. R t=1 l(θ u (t), x), (4.1)

8 8 K. Das, M. Elmasr and A. Sen In relaton to the results dscussed n earler sectons, the unobserved latent random varable s b, where ts condtonal dstrbuton b z, θ s found to be a skew-normal as llustrated n Proposton (3.1). Therefore, let θ (r) be a vector of parameter estmates n the r-th teraton, then the two MC-EM steps are MC E-step: for the -th unt at (r + 1) EM teraton, Q (θ θ (r) ) = E b z,θ (r)[l (θ x, y, b )] = R 1 R j=1 l (θ x, y, b (j) ), (4.2) and Q(θ θ (r) ) = m Q (θ θ (r) ), =1 where b (j) s the j-th draw generated from the dstrbuton of b z, θ (r), R s the number of replcaton on the -th unt. M-step: solvng the score equaton θ Q(θ θ(r) ) = 0. It s mportant to menton the work of Wu (1983), whch outlned a lst of condtons ensurng the convergence of the EM algorthm. Condtons as the boundedness of the log-lkelhood, compactness of the parameter space and the contnuty of the expectaton n the E-step wth respect to the estmated parameter. The log-lkelhood of the proposed model n (3.7) nvolves a term of the form log( Ψ ), whch could reach nfnty and compromse the convergence of the EM algorthm. To follow Wu (1983) condtons, heurstc methods of ntatng the algorthm from dfferent startng ponts s enforced n the MC-EM algorthm used n ths paper. Smlar heurstc methods were successfully used by Arellano-Valle et al. (2005). The followng sectons llustrate some numercal and real data results of the proposed model and algorthm. 4.1 An M-step for an exponental response Ths subsecton derves the lkelhood and ts partal dervatves when the response varable Y j x j, b follows an exponental dstrbuton wth mean functon η j = exp(x j β + b ), and densty f(y j η j ) = ηj 1 exp( y j ηj 1 ).

9 9 K. Das, M. Elmasr and A. Sen From (3.7) the unt log-lkelhood s l (θ y, x, b ) 1 2 log Ψ 1 2 (z b Σ 1/2 δ v ) Ψ 1 (z b Σ 1/2 δ v ) 1 2 n j=1 log(1 δ 2 j) 1 2 n j=1 n {y j e x jβ b + x j β + b }, j=1 (z j b δ j v ) 2 (1 δ 2 j ) where parameters are as defned n (3.7). Therefore, the margnal partal dervatves become β l (θ y, x, b ) = n j=1 x j {y j e x jβ b 1} 2 β 2 l n (θ y, x, b ) = x 2 jy j e x jβ b I(β) = m =1 j=1 ( ) 2 E β 2 l (θ y, x, b ) = m n =1 j=1 ˆβ = (X X) 1 X (log(d 1 (Y )I) + b I) where I = (1, 1,..., 1) and D 1 s an nverse dagonal matrx. Smlar results could be obtaned usng Gamma margnals wth canoncal lnk functon. x 2 j 5 Smulaton desgn and analyss To assess the effcency of the proposed lkelhood and model, a unvarate and a bvarate model settngs are used to nfer parameters. Under both settngs the number of unts s fxed to 200 and the number of observatons n s fxed to 5 for each unt. To generate the response varable Y b, snce the true parameters are known, we frst generate the per-unt multvarate skew-normal varable Z b as n (2.1) wth a specfed skewness vector λ. Then, we use the nverse of the lnk of the margnal dstrbutons of Z j and Y j defned n (2.5) to generate the per-unt multvarate response Y b. The followng two subsectons dscuss each model specfc settngs.

10 10 K. Das, M. Elmasr and A. Sen 5.1 Unvarate model Here we use a model of a fxed ntercept α and a unvarate random effect as Y b F n (η(α + b ), Σ(ξ )), (5.1) where F n s a multvarate dstrbuton from the exponental famly wth lnk functon η, as n Secton (4.1). The fxed and random effects coeffcents are set as α + b N 1 (3, 2) such that E[α + b ] = 3. The tme dfference per observaton wthn each unt s set to a unt dfference, that s the elements of Σ(ξ ) are { e ξ t j t k e ξ j k f j k =. (5.2) 1 f j = k, where ξ = ξ = 0.2. Fnally, snce we are smulatng frst the skew-normal varable Z b to get the response Y b we set the skewness vector λ = (1,..., 1). 5.2 Bvarate model Ths model nvestgates the convergence under extra varables, bnary and categorcal, whch n some cases could represent a measurement devaton caused by certan events. We use a model structure smlar to the one n Secton 6 of Arellano-Valle et al. (2005) as Y b F n (η(α + t j β 1 + ζ j β 2 + b ), Σ(ξ )), (5.3) where β 1 = 2, β 2 = 1 and t j = j 3 for j = 1,..., 5. A categorcal varable ζ j = 1 for 100 and ζ j = 0 otherwse. Smlar to the unvarate settngs, we let α + b N 1 (1, 4) such that E[α + b ] = 1 and V ar[α + b ] = 4. The tme dfference per observaton wthn each unt s set to a unt dfference as n (5.2), where ξ = ξ = 0.2, and the skewness vector λ = (1,..., 1). For each smulaton of a 100, we set the ntal estmates to β (0) = 1, λ (0) = 0.5, V ar[α + b ] = 1 and ξ (0) = 0.1. Usng the Monte Carlo EM algorthm, n each teraton we sample from b (k) Z, startng wth 50 samples per unt and gradually ncreasng untl convergence. 5.3 Exponental and gamma dstrbuted response Ths secton llustrates smulaton results of the proposed copula-drven GLMM usng the derved lkelhood and the proposed MC-EM algorthm, and compares t numercally to the ordnary normal copula, where the skewness vector λ s set to 0. The fnal mssng pece of the lkelhood n (3.7) s the specfcaton of the margnal dstrbuton of the response varable. Here we assume a response varable frst from the exponental and then the gamma dstrbuton wth a log-lnk

11 11 K. Das, M. Elmasr and A. Sen functon. For each smulaton a 100 Monte Carlo data sets are generated under the unvarate and bvarate settngs dscussed n the prevous subsectons. Tables 1 and 2, show the parameter estmates of the skew-normal on the left, and normal copula on the rght, usng exponental margnals, under the unvarate and bvarate settngs respectvely. The MC Mean and MC SD represent the Monte Carlo mean and standard devaton. MSE s the average standard error between Monte Carlo smulaton and the true value of the parameter. EC represents the emprcal coverage probablty computed usng Fsher nformaton matrx assumng a 95% confdence nterval. Note that n the bvarate model we calculate the EC for β 1 and β 2 usng a 95% ellptcal confdence nterval. The λ s the average skewness. Fgures 1 and 2 depct the convergence approxmaton graphcally, under both models respectvely for the skew-normal copula. Densty Densty (a) A sngle replcaton (b) 50 MC replcatons Fgure 1: Unvarate Settngs wth exponental margnals: the true and estmated densty of the response varable Y on the log scale; n bold and dotted lnes respectvely.

12 12 K. Das, M. Elmasr and A. Sen Table 1: Parameter estmaton under the unvarate settngs wth exponental margnals Skew-normal copula Normal copula Parameters True value MC Mean MC SD MSE EC MC Mean MC SD MSE EC α E[α + b] V ar[α + b] ξ λ Densty Densty (a) A sngle replcaton (b) 100 MC replcatons Fgure 2: Bvarate settngs wth exponental margnals: the true and estmated densty of the response varable Y on the log scale; n bold and dotted lnes respectvely. Table 2: Parameter estmaton under the bvarate settngs wth exponental margnals Skew-normal copula Normal copula Parameters True value MC Mean MC SD MSE EC MC Mean MC SD MSE EC α β β E[α + b] V ar[α + b] ξ λ

13 13 K. Das, M. Elmasr and A. Sen Smlarly, Fgure 3 and Table 3 show the smulaton results of the bvarate model, whle assumng gamma margnals. Table 3 also shows the estmated parameters when usng the normal copula nstead. The shape parameter of the gamma margnal s fxed to k = 3 and a log-lnk functon s used. Densty Densty (a) A sngle replcaton (b) 100 MC replcatons Fgure 3: Bvarate settngs wth gamma margnals: the true and estmated densty of the response varable Y on the log scale; n bold and dotted lnes respectvely. Table 3: margnals Parameter estmaton under the bvarate settngs wth gamma Skew-normal copula Normal copula Parameters True value MC Mean MC SD MSE EC MC Mean MC SD MSE EC α β β E[α + b] V ar[α + b] ξ λ The results presented above suggest good nference results for the proposed model, snce we are able to estmate the fxed parameters, the frst and second moments of the random effects, and to some degree the autoregressve coeffcent ξ. Nevertheless, we ntentonally fxed the number of observaton per

14 14 K. Das, M. Elmasr and A. Sen unt to 5, snce t allows the use of a unform λ vector and an autoregressve parameter ξ for all unts. In ths sense, we can estmate the unform parameters by drawng nformaton from all observatons. The reducton of the number of parameters s crtcal, snce otherwse one has more parameters than observatons. In our examples, usng unform autoregressve and skewness parameters, we only needed to estmate parameters, whle n general we have m + 5m parameters. For the case when the normal copula s used, the estmaton results of the fxed effects parameter s largely smlar to the proposed model. Ths result s evdent from (2.7), snce the choce of the copula s ndependent from the lkelhood of margnals. On the other hand, the estmaton results for the random effect show systematc bas when compared to the results of the skew-normal model. Ths estmaton bas arses from the fact that the skew-normal mean ncludes the skewness coeffcent n ts structure, thus t relates drectly to the condtonal dstrbuton of b z, v, as seen n Proposton 3.1. In the case of the correlaton parameter ξ, the results are comparable wth smaller dfferences n the bvarate settng, though a bt larger n the unvarate settng, arguably due to the heaver nfluence of the random effects on the lkelhood n the latter. It s worth mentonng that the product form of the densty n (2.7) allowed the lkelhood n (3.7) to decomposed nto three man parts. Ths n turn streamlned the estmaton procedure of the fxed effects coeffcent β to the maxmum-lkelhood estmate when assumng ndependent margnal denstes. One s then able to compute the nformaton matrx analytcally or by usng methods as n Lous (1982) to obtan the observed nformaton matrx. In ths secton we presented examples where the nformaton matrx s readly avalable. Nevertheless, we fnd t to be much more complex to calculate the observed nformaton matrx for the dsperson ξ and skewness λ varables, snce t requres dervng the autoregressve correlaton Σ n (3.7) for the former and Ψ for the latter. As a result, the coverage probabltes for both n the tables above are left blank. The smulaton was mplemented n R usng manly the packages sn and mnormt, whch are both mantaned by Adelch Azzaln. The sn package was used to sample from the skew-normal dstrbuton and ft the skew-normal parameters, manly Σ and ˆλ. Consequently, we estmate the dsperson parameter ξ by mnmzng the L 2 norm between the emprcal estmate Σ and the correlaton matrx Σ(ξ) construed usng (5.2), as ˆξ = arg mn ξ>0 { Σ Σ(ξ) 2 }. In respect to ˆξ we then realgn Σ to Σ(ˆξ). Lkewse, one could also use the general-purpose optmzaton package optm wth L-BFGS-B method wth a lower bound of τ > 0, less than an upper bound of max{δ δ}, to avod sngulartes n computng the nverse of the matrx Ψ n (3.1) and (3.3). Note

15 15 K. Das, M. Elmasr and A. Sen that dependng on the tme measurement of observatons t, the lower bound τ cannot be very small, otherwse one wll arrve at an all-ones matrx Σ. 6 An applcaton As an llustraton, we apply our methodology to the famous Framngham Heart Study that conssts of longtudnal data for a wde set of cohorts. Ths data has been analyzed earler n Zhang and Davdan (2001) and Arellano-Valle et al. (2005). The prmary objectve s to model the change of cholesterol levels over tme wthng patents. The data provdes cholesterol levels of 200 randomly selected patents, measured at the begnnng of the study and every two years for a total of 10 years. However, we only use the frst 3 observatons per patent snce t s the mnmum number of vsts seen n the data. The gender and age of those patents are also avalable. Snce the normal lnear mxed model analyzed by Zhang and Davdan (2001) s a partcular case of GLMM, we apply our methodology to a smpler mxed model under more general dstrbutonal (copula based) setup. In vew of the model proposed n Secton 5, we consder the followng model Y F n (α + β 1 sex + β 2 age + β 3 t + b, Σ(ξ, t )), (6.1) where the jth component y j of Y s the cholesterol level at the jth tme pont for unt (the observatons are normalze by a 100), t j = (tme 5)/10 (tme measured n years), b s the unt specfc random effect as n (3.2), and the correlaton coeffcents are defned as Corr(Y j, Y k ) = e ξ t j t, (6.2) where t s the tme of the frst vst. As n (2.5), the modelng s performed wth a gamma margnals and a log-lnk functon. Fgure 4a represents a hstogram of cholesterol levels of the 200 randomly selected patents where dotted lnes are the ftted model under the proposed settngs. Fgure 4b shows the same hstogram wth a 100 MC replcatons of b.

16 16 K. Das, M. Elmasr and A. Sen cholesterol levels (a) A sngle replcaton cholesterol levels (b) 100 MC replcatons Fgure 4: Fttng of Framngham Heart Study cholesterol data wth model (6.1) usng a gamma margnals wth a log-lnk functon. The shape parameter s set to k = 3. The sold lnes are the ftted model, whle the hstogram shows the frequency dstrbuton of cholesterol levels. Fgure 5a represents the denstes of the centralzed observed skew-normal varable resulted from each of the 100 MC-EM runs, where the hgh postve skewness s evdent. Fgure 5b shows the densty of the average centralzed skew-normal varable n sold, versus the densty of a zero locaton skew-normal generated usng the ftted parameters.

17 17 K. Das, M. Elmasr and A. Sen Densty Densty (a) Observed skew-normal denstes (b) Average observed skew-normal versus ftted Fgure 5: Fgure on the left s the denstes of centralzed observed skew-normal from each MC-EM run. The fgure on the rght, n bold s the average densty of the results n the left, whle the dotted lne s the densty of a zero locaton skew-normal gven the estmated parameters. Table 4 presents the parameter estmates and standard errors whch are calculated as SE(θ MLE ) = I(θ MLE ) 1/2, where I s the Fsher Informaton coeffcent of the maxmum lkelhood estmate of parameter θ. From the table, the estmated value of the correlaton coeffcent (ξ) s close to zero, ths does not automatcally mply that the proposed autoregressve correlaton structure s not adequate. The normalzaton of the tme varable t affects the magntude of ξ. To see ths better, the off-dagonal elements of the estmated correlaton matrx Σ(ˆξ) suggest a strong autoregressve structure n the data despte the low value of ˆξ Σ(ˆξ) = Moreover, β 2 and β 3 estmates are close to zero, suggestng that patents age or tme of observatons are not a predctor of cholesterol levels. Both β 1 and V ar[α + b] seem relatvely sgnfcant, emphaszng the mportance of the patents gender and the random effects coeffcent. The average skewness varable λ suggests a hghly skewed copula, as also ndcated n 5a. Nevertheless, gven the number of observatons, the model has many varables to estmate, whch dampen the estmaton accuracy. In ths case, we are estmatng 9 coeffcents for around 200 observatons.

18 18 K. Das, M. Elmasr and A. Sen Table 4: Fttng of Framngham Heart Study cholesterol data wth model (6.1) usng a gamma margnals wth a log-lnk functon, the shape parameter k = 3. Parameters Estmate SE α β β β E[α + b] V ar[α + b] ξ λ log-lkelhood AIC BIC Arellano-Valle et al. (2005) ftted the Framngham Heart Study cholesterol data under a mxture of Gaussan and skew-normal dstrbutons for the random effects and resduals. In ther model they used a bvarate random effect, whle the presented model n (6.1) uses a unvarate random effect. Moreover, Arellano- Valle et al. (2005) used a lnear mxed model formulaton whch dffers from the copula formulaton used here. For these dfferences, the average mean squared error of Arellano-Valle et al. (2005) surpasses the ft of the proposed model. Nonetheless, we beleve the copula formulaton allows more flexblty n modellng the response varable gven a robust estmaton procedure. In addton, ths s the frst step to estmate mxed models va a skew-normal copula, and future research s requred to determne better fts, and most mportantly, to ntegrate a random effects desgn matrx, and mprove the estmaton of the skewness and autoregressve varables. 7 Dscusson and future work The current nvestgaton s based on the development of a copula-drven GLMM, where the focus was on modelng the margnals n leu of the jont dstrbuton. Oftentmes margnal dstrbutons from the exponental famly do not necessarly lead to a multvarate dstrbuton of the same form. Nonetheless, we feel that copula based general multvarate dstrbutons may be of more nterest to appled statstcans. Our proposal ntended to llustrate such a typcal stuatons. In regard to the methodology, the MCEM seems to be appealng, though computatonally expensve. We feel that estmaton accuracy of the proposed model s pgged to the theoretcal lmtaton of the EM algorthm, specally n large dmensons. Oftentmes, the MCEM algorthm converged to local maxmums, and we feel that a post-em optmzaton procedure, such as gradent

19 19 K. Das, M. Elmasr and A. Sen descend, mght mprove the ft. One can also get rd of computatonal hassle to some extent by adoptng a MCMC n the Bayesan paradgm. In our subsequent nvestgatons, we are plannng to work wth a Bayesan paradgm n a more broad setup. More mportantly, we are plannng to ntegrate a desgn matrx for the random effects to extend t beyond the unvarate case. To mprove the accuracy, we are attemptng dfferent optmzaton technques. For computatonal convenence, an autoregressve structure was used to model the correlaton matrx, whch s not always applcable n real data, thus, we are plannng to nvestgate more flexble correlaton models. Acknowledgments: We would lke to acknowledge the Assocate Edtor and all revewers for ther valuable comments. References R. Arellano-Valle and M.G. Genton. Fundamental skew dstrbutons. Journal of Multvarate Analyss, 96:93 116, R. Arellano-Valle, H. Bolfarne, and V. Lachos. Skew-normal lnear mxed models. Journal of Data Scence, 3: , A. Azzaln. A class of dstrbutons whch ncludes the normal ones. Scandnavan Journal of Statstcs, 12: , A. Azzaln and A. Dalle-Valle. The multvarate skew-normal dstrbuton. Bometrka, 83: , Adelch Azzaln. The skew-normal and related famles, volume 3. Cambrdge Unversty Press, A.P. Dempster, N.M. Lard, and D.B. Rubn. Maxmum lkelhood from ncomplete data va the EM algorthm. Journal of the Royal Statstcal Socety, 39 (1):1 38, R.A. Fsher. The correlaton between relatves on the supposton of Mendelan nhertance. Transactons of the Royal Socety of Ednburgh, 52: , N. Henze. A probablstc representaton of the skew-normal dstrbuton. Scandnavan Journal of Statstcs, 13(4): , ISSN , Meels Käärk, Anne Selart, and Ene Käärk. On parametrzaton of multvarate skew-normal dstrbuton. Communcatons n Statstcs-Theory and Methods, 44(9): , 2015.

20 20 K. Das, M. Elmasr and A. Sen P. Lambert and F. Vandenhende. A copula-based model for multvarate nonnormal longtudnal data: analyss of a dose ttraton safety study on a new antdepressant. Statstcs n Medcne, 21(21): , Z. Landsman. Ellptcal famles and copulas: tltng and premum; captal allocaton. Scandnavan Actuaral Journal, 2009(2):85 103, Thomas A. Lous. Fndng the observed nformaton matrx when usng the em algorthm. Journal of the Royal Statstcal Socety. Seres B (Methodologcal), pages , X.L. Meng and D.B. Rubn. Maxmum lkelhood estmaton va the ECM algorthm: A general framework. Bometrka, 80:67 278, C. Meza, F. Osoro, and R. De la Cruz. Estmaton n nonlnear mxed-effects models usng heavy-taled dstrbutons. Statstcs and Computng, 22(1): , M.A. Newton and Y. Zhang. A recursve algorthm for non-parametrc analyss wth mssng data. Bometrca, K.J.F. Petersson, E. Hanze, R.M. Savc, and M.O. Karlsson. Semparametrc dstrbutons wth estmated shape parameters. Pharmaceutcal Research, 26 (9): , R. Sundberg. Maxmum lkelhood theory for ncomplete data from an exponental famly. Scandnavan Journal of Statstcs, 1(2):pp , H. Tao, M. Aptal, B.S. Yandell, and M.A. Newton. An estmaton method for the sem-parametrc mxed effects model. Bometrcs, 55: , G. Verbeke and E. Lesaffre. A lnear mxed-effects model wth heterogenety n the random-effects populaton. Journal of the Amercan Statstcal Assocaton, 91: , G.C.G. We and M.A. Tanner. A Monte Carlo mplementaton of the EM algorthm and the poor mans data augmentaton algorthms. Journal of the Amercan Statstcal Assocaton, 85: , C.F.J. Wu. On the convergence propertes of the EM algorthm. The Annals of Statstcs, 11(1):95 103, J. Wu, X. Wang, and S.G. Walker. Bayesan nonparametrc nference for a multvarate copula functon. Methodology and Computng n Appled Probablty, 16(3): , D. Zhang and M. Davdan. Lnear mxed models wth flexble dstrbutons of random effects for longtudnal data. Bometrcs, 57: , 2001.

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

$Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010$ Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton