Flexible Tweedie regression models for continuous data

Size: px
Start display at page:

Download "Flexible Tweedie regression models for continuous data"

Transcription

1 Flexible Tweedie regression models for continuous data arxiv: v1 [stat.me] 12 Se 2016 Wagner H. Bonat and Célestin C. Kokonendji Abstract Tweedie regression models rovide a flexible family of distributions to deal with non-negative highly right-skewed data as well as symmetric and heavy tailed data and can handle continuous data with robability mass at zero. The estimation and inference of Tweedie regression models based on the maximum likelihood method are challenged by the resence of an infinity sum in the robability function and non-trivial restrictions on the ower arameter sace. In this aer, we roose two aroaches for fitting Tweedie regression models, namely, quasi- and seudo-likelihood. We discuss the asymtotic roerties of the two aroaches and erform simulation studies to comare our methods with the maximum likelihood method. In articular, we show that the quasilikelihood method rovides asymtotically efficient estimation for regression arameters. The comutational imlementation of the alternative methods is faster and easier than the orthodox maximum likelihood, relying on a simle Newton scoring algorithm. Simulation studies showed that the quasi- and seudo-likelihood aroaches resent estimates, standard errors and coverage rates similar to the maximum likelihood method. Furthermore, the secondmoment assumtions required by the quasi- and seudo-likelihood methods enables us to extend the Tweedie regression models to the class of quasi- Tweedie regression models in the Wedderburn s style. Moreover, it allows to eliminate the non-trivial restriction on the ower arameter sace, and thus rovides a flexible regression model to deal with continuous data. We rovide R imlementation and illustrate the alication of Tweedie regression models using three data sets. Deartment of Mathematics and Comuter Science, University of Southern Denmark, Odense, Denmark. Deartment of Statistics, Paraná Federal University, Curitiba, Paraná, Brazil. wbonat@ufr.br Université de Franche-Comté, Laboratoire de Mathématiques de Besançon, Besançon, France. celestin.kokonendji@univ-fcomte.fr 1

2 1 Introduction Statistical modelling is one of the most imortant areas of alied statistics with alications in many fields of scientific research, such as sociology, economy, ecology, agronomy, insurance, medicine to cite but a few. There exists an infinity of statistical modelling frameworks, but the class of Generalized Linear Models (GLM) (Nelder and Wedderburn; 1972) is the most used in the last four decades. The success of this aroach is due to its ability to deal with different tyes of resonse variables, such as binary, count and continuous in a general framework with a owerful scheme for estimation and inference based on the likelihood aradigm. Secial cases of the GLM class include the Gaussian linear model to deal with continuous data, gamma and inverse Gaussian regression models for handling ositive continuous data. Logistic and Poisson regression models for dealing with binary or binomial and count data, resectively. These models are linked, since they belong to the class of the exonential disersion models (Jørgensen; 1987, 1997), and share the roerty to be described by their first two moments, mean and variance. Furthermore, the variance function lays an imortant role in the context of exonential disersion models, since it describes the relationshi between the mean and variance and characterizes the distribution (Jørgensen; 1997). Let Y denote the resonse variable and assume that the density robability function of Y belongs to the class of exonential disersion models. Furthermore, we assume that E(Y ) = µ and Var(Y ) = φv (µ) = φµ then Y Tw (µ, φ), where Tw (µ, φ) denotes a Tweedie (Tweedie; 1984; Jørgensen; 1997) random variable with mean µ and variance φµ, such that φ > 0 and (, 0] [1, ) are the disersion and ower arameters, resectively. The suort of the distribution deends on the value of the ower arameter. For 2, 1 < < 2 and = 0 the suort corresonds to the ositive, non-negative and real values, resectively. In these cases µ Ω, where Ω is the convex suort (i.e. the interior of the closed convex hull of the corresonding distribution suort). Finally, for < 0 the suort corresonds to the real values, however the exectation µ is ositive. For ractical data analysis, the Tweedie distribution is interesting, since it has as secial cases the Gaussian ( = 0), Poisson ( = 1), non-central gamma ( = 3/2), gamma ( = 2) and inverse Gaussian ( = 3) distributions (Jørgensen; 1987, 1997). Another imortant case often alied in the context of insurance data (Smyth and Jørgensen; 2002; Jørgensen and Paes De Souza; 1994) corresonds to the comound Poisson distribution, obtained when 1 < < 2. The comound Poisson distribution is a frequent choice for the modelling of non-negative data with robability mass at zero and highly right-skewed. 2

3 The ower arameter lays an imortant role in the context of Tweedie models, since it is an index which distinguishes between some imortant continuous distributions. The algorithms we shall roose in Section 3 allow us to estimate the ower arameter, which works as an automatic distribution selection. Although, the estimation of the regression arameters is less affected by the disersion structure, the standard errors associated with the regression arameters are determined by disersion structure, which justifies dedicate attention to the estimation of the ower and disersion arameters. The orthodox aroach is based on the likelihood aradigm, which in turn is an efficient estimation method. However, a articularity about the Tweedie distribution is that outside the secial cases, its robability density function cannot be written in a closed form, and requires numerical methods for evaluating the density function. Dunn and Smyth (2005, 2008) roosed methods to evaluate the density function of the Tweedie distribution, but these methods are comutationally demanding and show different levels of accuracy for different regions of the arameter sace. Furthermore, the arameter sace associated with the ower arameter resents non-trivial restrictions. Current software imlementations (Dunn; 2013) are restricted to dealing with 1. These facts become the rocess of inference based on the likelihood aradigm difficult and sometimes slow. The main goal of this aer is to roose alternative methods for estimation and inference of Tweedie regression models. In articular, we discuss the quasilikelihood (Jørgensen and Knudsen; 2004; Bonat and Jørgensen; 2016) and seudolikelihood (Gourieroux et al.; 1984) aroaches. These methods are fast and simle comutationally, because they emloy the first two moments, merely avoiding to evaluate the robability density function. Moreover, the second-moment assumtions required by the quasi- and seudo-likelihood methods allow us to extend the Tweedie regression models to the class of quasi-tweedie regression models in the style of Wedderburn (1974). The weaker assumtions of the second-moments secification eliminate the restrictions on the arameter sace of the ower arameter. Hence, it is ossible to estimate negative and between zero and one values for the ower arameter. In this way, we overcome the main restrictions of current software imlementations and rovide a flexible regression model to deal with continuous data. We resent the theoretical develoment of the quasi- and seudo-likelihood methods in the context of Tweedie regression models. In articular, we show analytically that the quasi-likelihood aroach rovides asymtotic efficient estimation for regression arameters. We resent efficient and stable fitting algorithms based on the two new aroaches and rovide R comutational imlementation. We emloyed 3

4 simulation studies to comare the roerties of our aroaches with the maximum likelihood method in a finite samle scenario. We comare the aroaches in terms of bias, efficiency and coverage rate of the confidence intervals. Furthermore, we exlore the flexiblity of Tweedie regression models to deal with heavy tailed distributions. Tweedie distributions are extensively used in statistical modelling, thereby motivating the study of their estimation in a more general framework. Alications include Lee and Whitmore (1993); Barndorff-Neilsen and Shehard (2001); Vinogradov (2004), who alied Tweedie distributions for describing the chaotic behaviour of stock rice movements. Further alications include roerty and causality insurance, where Jørgensen and Paes De Souza (1994) and Smyth and Jørgensen (2002) fit the Tweedie family to automobile insurance claims data. Tweedie distributions have also found alications in biology (Kendal; 2004; Kendall; 2007), fisheries research (Foster and Bravington; 2013; Hiroshi; 2008), genetics and medicine (Kendal et al.; 2000). Chen and Tang (2010) resented Bayesian semiarametric models based on the reroductive form of exonential disersion models. Zhang (2013) discussed the maximum likelihood and Bayesian estimation for Tweedie comound Poisson linear mixed models. For a recent alication and further references see Bonat and Jørgensen (2016). The rest of the aer is organized as follows. In the next section, we rovide some background about Tweedie regression models. Section 3 discusses the aroaches to estimation and inference. Section 4 resents the main results from our simulation studies. Section 5 resents the alication of Tweedie regression models to three data sets. The first one concerns daily reciitation in Curitiba, Paraná State, Brazil. This dataset illustrates the analysis of ositive continuous data with robability mass at zero. The second data set corresonds to a cross-section study develoed for studying the income dynamics in Australia. This dataset shows the analysis of ositive, highly right-skewed resonse variable. The last data set illustrates the analysis of symmetric ositive data, where current imlementations have roblems to deal with ower arameter smaller than 1. Finally, Section 6 reorts some final remarks. The R imlementation is available in the sulementary material. 2 Tweedie regression models The Tweedie distribution belongs to the class of exonential disersion models (EDM) (Jørgensen; 1987, 1997). Thus, for a random variable Y which follows an EDM, the density function can be written as: f Y (y; µ, φ, ) = a(y, φ, ) ex{(yψ k(ψ))/φ}, 4

5 where µ = E(Y ) = k (ψ) is the mean, φ > 0 is the disersion arameter, ψ is the canonical arameter and k(ψ) is the cumulant function. The function a(y, φ, ) cannot be written in a closed form aart of the secial cases cited. The variance is given by Var(Y ) = φv (µ) where V (µ) = k (ψ) is called the variance function. Tweedie densities are characterized by ower variance functions of the form V (µ) = µ, where (, 0] [1, ) is the index determining the distribution. Although, Tweedie densities are not known in closed form, their cumulant generating function is simle. The cumulant generating function is given by K(t) = {k(ψ + φt) k(ψ)}/φ, where k(ψ) is the cumulant function, { µ 1 1 ψ = 1 log µ = 1 { and k(ψ) = µ log µ = 2. The remaining factor in the density, a(y, φ, ) needs to be evaluated numerically. Jørgensen (1997) resents two series exressions for evaluating the density, for 1 < < 2 and for > 2. In the first case can be shown that, } P (Y = 0) = ex { µ2 φ(2 ) and for y > 0 that a(y, φ, ) = 1 W (y, φ, ), y with W (y, φ, ) = k=1 W k and W k = y kα ( 1) αk φ k(1 α) (2 ) k k!γ( kα), where α = (2 )/(1 ). A similar series exansion exists for > 2 and it is given by: with V = k=1 V k and a(y, φ, ) = 1 V (y, φ, ), πy V k = Γ(1 + αk)φk(α 1) ( 1) αk Γ(1 + k)( 2) k y αk ( 1) k sin( kπα). 5

6 Dunn and Smyth (2005) resented detailed studies about these series and an algorithm to evaluate the Tweedie density function based on series exansions. The algorithm is imlemented in the ackage tweedie (Dunn; 2013) for the statistical software R(R Core Team; 2016) through the function dtweedie.series. Dunn and Smyth (2008) also studied two alternative methods to evaluate the density function of the Tweedie distributions, one based on the inversion of cumulant generating function using the Fourier inversion and the sandleoint aroximation, for more details see Dunn (2013). In this aer, we used only the aroach described in this Section, i.e. based on series exansions. We now turn to Tweedie regression models. Consider a cross-sectional dataset, (y i, x i ), i = 1,..., n, where y i s are i.i.d. realizations of Y i according to Y i Tw (µ i, φ) and g(µ i ) = η i = x i β, where x i and β are (Q 1) vectors of known covariates and unknown regression arameters, resectively. It is straightforward to see that E(Y i ) = µ i = g 1 (x i β) and the Var(Y i ) = C i = φµ i. Hence, the model is equivalently secified by its joint distribution and by its first two moments. The Tweedie regression model is arametrized by θ = (β, λ = (φ = ex(δ), ) ). Note that, we introduce the rearametrization φ = ex(δ) for comutational convenience. Finally, in this aer we adot the orthodox logarithm link function. 3 Estimation and Inference This section is devoted to estimation and inference of Tweedie regression models. In what follows, we shall discuss the maximum likelihood, quasi-likelihood and seudolikelihood methods. 3.1 Maximum likelihood estimation The maximum likelihood estimator () for the arameter vector θ denoted by ˆθ M is obtained by maximizing the following log-likelihood function, L(θ) = log {a(y i ; λ)} + 1 ex(δ) (y iψ i k(ψ i )). (1) As we shall show below the vectors β and λ are orthogonal, hence is sensible to discuss each of them searately. The score function for the regression arameters β = (β 0,..., β Q ) is given by U β (β, λ) = ( ) L(θ),..., L(θ), β 1 β Q 6

7 where L(θ) β j = = L(θ) ψ i µ i η i ψ i µ i η i β j [ ] 1 µ i x ij ex(δ)µ (y i µ i ), i for j = 1,..., Q. The entry (j, k) of the Q Q Fisher information matrix F β for the regression coefficients is given by by { } 2 L(θ) F βjk = E = β j β k [ µ i x ij 1 ex(δ)µ i ] µ i x ik. (2) Similarly, the score function for the disersion arameters λ = (ex(δ), ) is given U λ (λ, β) = whose comonents are given by ( ) L(θ), L(θ), δ L(θ) δ = δ log a(y i; λ) 1 ex(δ) (y iψ i κ(ψ i )) (3) and L(θ) = log a(y i; λ) + 1 [ ψ i y i ex(δ) κ(ψ ] i). (4) The entry (j, k) of the 2 2 Fisher information matrix F λ for the disersion arameters is given by { } 2 L(θ) F λjk = E. (5) λ j λ k The derivative in equations (3), (4) and (5) deends on the derivative of the infinite sum a(y i ; λ), and it cannot be exressed in closed form. Hence, numerical methods are required for aroximating these derivatives. Let Ũλ and F λ denote the aroximated score function and observed information matrix for the disersion arameters, resectively. In this aer, we adoted the Richardson method (Fornberg and Sloan; 1994), as imlemented in the R ackage numderiv (Gilbert and Varadhan; 2015) for comuting these aroximations. Furthermore, the cross entries of 7

8 the Fisher information matrix are given by { } [ Uβj (β, λ) 1 F βj δ = E = E {µ i x ij δ ex(δ)µ i ] } (y i µ i ) = 0 and { } Uβj (β, λ) F βj = E { = E [ 1 µ i x ij ex(δ)µ i ] } (y i µ i ) = 0. Hence, the vectors β and λ are orthogonal. The joint Fisher information matrix for θ is given by ( ) Fβ 0 F θ =, 0 F λ whose entries are defined by (2) and (5). Finally, the asymtotic distribution of ˆθ M is ˆθ M N(θ, F 1 θ ) where F 1 θ denote the inverse of the Fisher information matrix. In ractice the entry F λ is relaced by the aroximation F λ. In order to solve the system of equations U β = 0 and Ũλ = 0, we emloy the two stes Newton scoring algorithm, defined by β (i+1) = β (i) F 1 β U β(β (i), λ (i) ) λ (i+1) = λ (i) 1 F λ Ũλ(β (i+1), λ (i) ), (6) which in turn exlicitly uses the orthogonality between β and λ. The numerical evaluation of the derivatives required in equations (3), (4) and (5) can be inaccurate, mainly for 1, i.e. the border of the arameter sace. Thus, an alternative aroach is to maximize directly the log-likelihood function in equation (1) using a derivative-free algorithm as the Nelder-Mead method (Nelder and Mead; 1965). A more comutationally efficient aroach is to use the Nelder- Mead algorithm for maximizing only the rofile log-likelihood for the disersion arameters, which in turn is obtained by inserting the first equation of the two stes Newton scoring algorithm (6) in the log-likelihood function (1). Note that, by using this aroach for each evaluation of the rofile likelihood, we have a maximization roblem for the regression arameters. We imlemented these three aroaches to obtain the maximum likelihood estimator. The direct maximization of the loglikelihood function using the Nelder-Mead algorithm is slow, mainly for large number of regression coefficients. The two stes Newton scoring algorithm resented many 8

9 convergence roblems for small values of the ower arameter. Finally, the rofile likelihood aroach is the fast and stable imlementation. However, the rofile likelihood aroach resented roblems to comute the standard errors associated with the disersion estimates for 1. In this aer, we used only the aroach based on the rofile log-likelihood, but we also rovide R code for the other two aroaches. 3.2 Quasi-likelihood estimation We shall now introduce the quasi-likelihood estimation using terminology and results from Jørgensen and Knudsen (2004); Holst and Jørgensen (2015); Bonat and Jørgensen (2016). The quasi-likelihood aroach adoted in this aer combines the quasi-score and Pearson estimating functions to estimation of regression and disersion arameters, resectively. The aroach is also discussed in the context of estimating functions, see Liang and Zeger (1995); Jørgensen and Knudsen (2004) for further details. The quasi-score function for β has the following form, U q β (β, λ) = ( µ i β 1 C 1 i (y i µ i ),..., µ i β Q C 1 i (y i µ i ) ), where µ i / β j = µ i x ij for j = 1,..., Q. The entry (j, k) of the Q Q sensitivity matrix for U q β is given by ( ) [ ] S βjk = E U q 1 β β j (β, λ) = µ i x ij k ex(δ)µ x ik µ i. (7) i In a similar way, the entry (j, k) of the Q Q variability matrix for U q β V βjk = Var(U q β (β, λ)) = µ i x ij [ 1 ex(δ)µ i ] x ik µ i. is given by Following Jørgensen and Knudsen (2004); Bonat and Jørgensen (2016), the Pearson estimating function for the disersion arameters has the following form, U q λ (λ, β) = ( ) [ ] W iδ (yi µ i ) 2 [ ] C i, W i (yi µ i ) 2 C i, 9

10 where W iδ = C 1 i / δ and W i = C 1 i /. The Pearson estimating functions are unbiased estimating functions for λ based on the squared residuals (y i µ i ) 2 with mean C i. It is equivalent to treating the squared residual as a gamma variable, which is hence close in sirit to Perry s gamma regression method (Jørgensen et al.; 2011; Park and Cho; 2004). We shall now calculate the sensitivity matrix for the disersion arameters. The entry (j, k) of the 2 2 sensitivity matrix is given by ( ) S λjk = E U q λ λ j (λ, β) = W iλj C i W iλk C i, k where λ 1 and λ 2 denote either δ or, giving ( n n S λ = log(µ ) i) n log(µ i) n log(µ i) 2. (8) Similarly, the cross entries of the sensitivity matrix are given by ( ) S βj λ k = E U q β λ j (β, λ) = 0 (9) k and ( ) S λj β k = E U q λ β j (λ, β) = k W iλj C i W iβk C i, (10) where W iβk = C 1 i / β k. Finally, the joint sensitivity matrix for the arameter vector θ is given by ( ) Sβ 0 S θ =, S λβ whose entries are defined by (7), (8), (9) and (10). We shall now calculate the asymtotic variance of the quasi-likelihood estimators denoted by ˆθ QL, as obtained from the inverse Godambe information matrix, whose V θs θ general form is J 1 θ = S 1 θ inverse transose. The variability matrix for θ has the form S λ for a vector of arameter θ, where denotes ( Vβ V V θ = βλ V λβ V λ ), (11) whereas V λβ = V βλ and V λ deend on the third and fourth moments of Y i, resectively. In order to avoid this deendence on high-order moments, we roose to use the emirical versions of V λ and V λβ, which entries are given by 10

11 Ṽ λjk = U q λ j (λ, β) i U q λ k (λ, β) i and Ṽ λj β k = U q λ j (λ, β) i U q β k (λ, β) i. Finally, the asymtotic distribution of ˆθ QL is given by ˆθ QL N(θ, J 1 θ ). We may show by using standard results for inverse of artitioned matrix that ( J 1 S 1 β θ = V βs 1 β S 1 β ( V λs 1 β S λβ + ) V λβ )S 1 λ S 1 λ ( S λβs 1 β V β + V λβ )S 1 β S 1 λ (L + V λ)s 1, λ where L = S λβ S 1 β (V βs 1 β S λβ V λβ ) V λβs 1 β S λβ. Moreover, note that S 1 β V βs 1 β = V 1 β, it shows that for known disersion arameters, the asymtotic variance of the quasi-likelihood regression estimators reaches the Cramer Rao lower bound, which in turn shows that the quasi-likelihood aroach rovides asymtotically efficient estimators for the regression coefficients. Jørgensen and Knudsen (2004) roosed the modified chaser algorithm to solve the system of equations U q β = 0 and U q λ = 0, defined by β (i+1) = β (i) S 1 β U q β (β(i), λ (i) ) λ (i+1) = λ (i) S 1 λ U q λ (β(i+1), λ (i) ). The modified chaser algorithm uses the insensitivity roerty (9), which allows us to use two searate equations to udate β and λ. 3.3 Pseudo-likelihood estimation We shall now resent the seudo-likelihood aroach using terminology and results from Gourieroux et al. (1984). The seudo-likelihood aroach considers the roerties of estimators obtained by maximizing a likelihood function associated with a family of robability distributions, which does not necessarily contain the true distribution. In articular, in this aer to estimation of Tweedie regression models, we adoted the Gaussian seudo-likelihood, whose logarithm is given by L (θ) = n 2 log(2π) nδ (log µ i (y i µ i ) 2 2 ex(δ)µ i ). (12)

12 The seudo-score function for θ is given by U θ (β, λ) = ( L (θ) β 0,..., L (θ) β Q whose comonents have the following form and L (θ) β j = 2 L (θ) x ij + L (θ) δ = 1 2, L (θ) δ (y i µ i ) 2 2 ex(δ)µ x ij + i = n ex(δ) (y i µ i ) 2 1 log(µ i ) + 2 ex(δ) ), L (θ), (y i µ i ) ex(δ)µ 1 x ij, (13) i µ i (14) log(µ i ) µ (y i µ i ) 2. (15) i We note in assing that Equation (13) is an unbiased estimating function for β j based on the linear and squared residuals. Similarly, note that Equations (14) and (15) are unbiased estimating functions for δ and based on the squared residuals. Gourieroux et al. (1984) showed under classical assumtions, that the seudolikelihood estimators denoted by ˆθ P L and obtained by maximizing Equation (12) converge almost surely to θ. Furthermore, ˆθ P L converges in distribution to N(θ, S 1 θ V θs 1 where ( ) S θ = E 2 L (θ) and V θ θ θ = E ( U θ (β, λ)u θ (β, λ) ). Similarly, the variability matrix (11) in the context of quasi-likelihood estimation, the matrix V θ deends on third and fourth moments. Hence, we roose to use the emirical version of V θ, which is given by Ṽ θ = U θ (θ) iu θ (θ) i, where the sum is understood to be element-wise. We shall now comute the comonents of the S θ. First, note that the matrix S θ can be artitioned as S β S βδ S β S θ = S δβ S δ S φ. S β S φ S 12 θ )

13 The entry (j, k) of the Q Q matrix S β is given by S βjk = ( 2 x ij x ik 2 + x ) ijx ik ex(δ)µ 2. i Similarly, the entries S δ and S are resectively given by S δ = n 2 and S = Furthermore, the cross entries have the form log(µ i ) 2. 2 S βj δ = x ij 2, S β j = log(µ i )x ij 2 and S δ = log(µ i ). 2 U θ Finally, we roose the Newton scoring algorithm to solve the system of equations (β, λ) = 0, defined by θ (i+1) = θ (i) S 1 θ U θ (β(i), λ (i) ). In that case, we have to udate β and λ together, since the cross-entries of S θ are not zeroes. 4 Simulation studies In this section we shall resent two simulation studies designed to i) check the asymtotic roerties of the maximum, quasi- and seudo-likelihood estimators in a finite samle scenario and ii) check the robustness of the Tweedie regression models in the case of missecification by heavy tailed distributions. 4.1 Fitting Tweedie regression models In this section we resent a simulation study that was conducted to comare the roerties of the estimation methods. We evaluated the exected bias, consistency, coverage rate and efficiency for the maximum likelihood (), quasi-likelihood (Q) and seudo-likelihood (P) estimators. We generated 1000 data sets considering four samle sizes 100, 250, 500 and We considered five values of the ower arameter 0, 1.01, 1.5, 2 and 3 combined with three amounts of variation. 13

14 We used the average coefficient of variation to measure the amount of variation introduced in the data. We defined, small, medium and large amount of variation data sets generated using coefficient of variation equals to 15%, 50% and 80%, resectively. The values of the ower arameter were chosen to have non-standard situations, as the cases of = 0 and = 1.01 where we exect the does not work. The case of = 2 is also difficult for maximum likelihood estimation, since the robability density function should be evaluated using two different infinity sums, for < 2 and > 2. The cases = 1.5 and = 3 reresent the standard comound Poisson and inverse Gaussian distributions, resectively. In these cases, we exect that the works well, so we have safe results to comare with our two alternative aroaches. All scenarios consider models with an intercet (β 0 = 2) and sloes (β 1 = 0.8, β 2 = 1.5). The covariates are a sequence from 1 to 1, reresenting a continuous covariate, a factor with two levels (0 and 1) and length equals the samle size. For = 0 the disersion arameter values are φ = (75, 850, 2100) corresonding, resectively, to small (15%), medium (50%) and large (80%) variation. Similarly, for = 1.01, = 1.5, = 2 and = 3 the disersion arameter values are φ = (1.5, 15, 40), φ = (0.2, 2, 5.3), φ = (0.023, 0.25, 0.65) and φ = (0.0003, , ), resectively. Fig. 1 shows the exected bias lus and minus the exected standard error for the arameters on each model and scenario. The scales are standardized for each arameter dividing the exected bias and the limits of the confidence intervals by the standard error obtained on the samle of size 100. The results in Fig. 1 show that for the quasi- and seudo-likelihood methods and all simulation scenarios, both the exected bias and standard error tend to 0 as the samle size is increased. It shows the consistency and unbiasedness of our estimators. As exected the maximum likelihood method did not work for = 0 and = 1.01 in the medium and large variation scenarios. In these cases, the algorithm failed for all simulated data sets. In the cases of small variation the algorithm converged for 132 and 326 data sets for = 0 and = 1.01, resectively. In these scenarios, although the large bias for the disersion arameters, the regression coefficients were consistently estimated. Fig. 2 resents the coverage rate by estimation methods, samle size and simulation scenarios. The results resented in Fig. 2 show that in general for large samles the coverage rates are close to the nominal level () for all arameters and simulation scenarios. The resented coverage rate zero for the disersion arameters, when = 0 and = 1.01 in all simulation scenarios (not shown). The quasi-likelihood method resented coverage rate closer to the nominal level than the seudo-likelihood method, mainly for disersion arameters and large values of the ower arameter ( 1.5). Regarding the estimation methods as exected the resented the coverage rate 14

15 φ β 2 β 1 β ; ; ;2 1.5;5.3 2; ;0.25 2;0.65 3; ; ; Standardized scale φ β 2 β 1 β 0 0;75 0;850 0; ; ; ;40 1.5; ;2 1.5;5.3 2; ;0.25 2;0.65 3; ; ; ;75 0;850 P P P P P P P P P P P P P P P φ β 2 β 1 β 0 0; ;850 0; ;1.5 Samle size ; ;40 1.5; ;2 1.5; ; ; ;0.65 3; ; ; ; ; ;40 Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Figure 1: Exected bias and confidence interval on a standardized scale by estimation methods (maximum likelihood (), seudo-likelihood (P) and quasilikelihood (Q)), samle size and different values of the ower and disersion arameters (; φ). 15

16 Methods P Q hi 3; ; ; ; hi 3; ; ; ; ; hi 3; ; ; ; ; hi 3; ;0.65 2;0.65 2;0.65 2;0.65 hi 2;0.65 2;0.25 2;0.25 2;0.25 2;0.25 hi 2;0.25 2; ; ; ;0.023 hi 2;0.023 Coverage rate 1.5; ;2 1.5; ;2 1.5; ;2 1.5; ;2 hi hi 1.5; ;2 1.5; ; ; ;0.2 hi 1.5; ; ; ; ;40 hi 1.01; ; ; ; ;15 hi 1.01; ; ; ; ;1.5 hi 1.01;1.5 0;2100 0;2100 0;2100 0;2100 hi 0;2100 0;850 0;850 0;850 0;850 0;850 0;75 0;75 0;75 0;75 hi 0; Samle size Figure 2: Coverage rate for each arameter (β 0, β 1, β 2, φ, ) by estimation methods (maximum likelihood (), seudo-likelihood (P) and quasi-likelihood (Q)), samle size and different values of the ower and disersion arameters (; φ). 16

17 close to the nominal level for large values of the ower arameter. The alternative aroaches worked well in all simulation scenarios, including the cases where the did not work. Finally, Fig. 3 resents the emirical efficiency of the quasiand seudo-likelihood estimators. The emirical efficiency was comuted as the ratio between the variance of the and the variance obtained by the alternative aroaches. We comuted the efficiency only for the cases where 1.5, since for the other cases the resented no reliable results. The results in Fig.3 show that for the regression coefficients both Q and P aroaches resented efficiency close to 1 in all simulation scenarios. Concerns the disersion arameters, for the small variation scenario the Q and P resented efficiency close to 1. However, when the variation increased these estimators loss efficiency, the worst scenario aears for = 1.5 and large variation, where the efficiency resented values around 20%. In general the P is more efficient than the Q for the disersion and ower arameters. 4.2 Robustness of Tweedie regression models In this subsection we resent a simulation study that was conducted to evaluate the robustness of the Tweedie regression models in the case of model missecification by heavy tailed distributions. We generated 1000 data sets considering four samle sizes 100, 250, 500 and 1000 following two heavy tailed distributions, namely, t-student and slash. The arametrization adoted was the one imlemented in the R ackage heavy (Osorio; 2016). For both distributions, we designed three simulation scenarios according to the amount of variation introduced in the data. We defined, small, medium and large amount of variation data sets generated using disersion arameter equals to 100, 500 and 1000, resectively. In order to simulate challenge data sets, we used 2 degrees of freedom. The mean structure was secified as in the subsection 4.1. In the case of heavy tailed distributions, we exect negative values for the ower arameter. Thus, we fitted the Tweedie regression models by using the quasi- and seudo-likelihood aroaches. In order to comute the emirical efficiency of the quasi- and seudo-likelihood estimators, we fitted t-student regression models along with the logarithm link function, as imlemented in the ackage gamlss(family TF) (Rigby and Stasinooulos; 2005). Although, of the extensive literature on robust estimation methods, in this aer we adoted the t-student regression models, since it is a frequent choice for the analysis of heavy tailed data (Huber and Ronchetti; 2009) and can be fitted using the orthodox maximum likelihood method. Furthermore, since there is no software available for fitting slash regression models using logarithm link function, the t-student 17

18 P Methods Q hi ; ; ; ; ; ; ; ; ; hi 3; hi ; ; ; ; ; hi 1.0 2;0.65 2;0.65 2;0.65 2;0.65 2; Emirical efficiency ;0.25 2;0.25 2;0.25 2;0.25 hi hi 2; ; ; ; ; ; hi ; ; ; ;5.3 hi 1.5; ;2 1.5;2 1.5;2 1.5;2 1.5; hi ; ; ; ; ; Samle size Figure 3: Emirical efficiency for each arameter (β 0, β 1, β 2, φ, ) by estimation methods (maximum likelihood (), seudo-likelihood (P) and quasi-likelihood (Q)), samle size and different values of the ower and disersion arameters (; φ). 18

19 regression models were used as the base of comarison for both t-student and slash data sets. Fig. 4 shows the exected bias lus and minus the exected standard error for the regression arameters by estimation methods, samle size and simulation scenarios. The results resented in Fig. 4 show that the three estimation methods rovide unbiased and consistent estimates of the regression arameters in all simulation scenarios. As exected, the standard errors associated with the regression arameters increase while the amount of variation introduced in the data increases. Fig. 5 resents the coverage rate by estimation methods, samle size and simulation scenarios. The emirical coverage rate resented values close to the nominal secified level of 95% for all estimation methods and simulation scenarios. The method resented coverage rate closer to the nominal level than the Q and P methods, however, the difference is no larger than 3%. The coverage rate of the Q and P were virtually the same for all regression arameters, samle size and simulation scenarios. Finally, Fig. 6 resents the emirical efficiency of the Q and P estimators for the regression arameters. The emirical efficiency was comuted as the ratio between the variance of the obtained by fitting the t-student regression models and the variance of the Q and P estimators obtained by fitting the Tweedie regression models. The emirical efficiency resented values close to 1 for the small variation simulation scenarios, however, when the amount of variation increases both Q and P loss efficiency. The loss were around 10% and 20% for the medium and large variation scenarios, resectively. The results are worse for large samles. The P resents efficiency slightly closer to the nominal level than the Q. 5 Data analyses In this section we shall resent three illustrative examles of Tweedie regression models. The data that are analysed and the rograms that were used to analyse them can be obtained from: htt:// 19

20 100 t student Samle size slash 500 t student slash 1000 t student slash β 2 β 1 QUASI QUASI QUASI QUASI QUASI QUASI β t student 100 slash 500 t student 500 slash 1000 t student 1000 slash β 2 β 1 PSEUDO PSEUDO PSEUDO PSEUDO PSEUDO PSEUDO β t student 100 slash 500 t student 500 slash 1000 t student 1000 slash β 2 β 1 β ^ bias ± SE Figure 4: Exected bias and confidence interval by estimation methods (quasilikelihood (Q), seudo-likelihood (P) and maximum likelihood ()), samle size and simulation scenarios. 20

21 Methods PSEUDO QUASI slash 1000 slash 1000 slash t student 1000 t student 1000 t student Emirical coverage rate t student 500 slash 500 t student 500 slash 500 t student 500 slash slash 100 slash 100 slash t student 100 t student 100 t student Samle size Figure 5: Coverage rate for regression arameters by estimation methods (quasilikelihood (Q), seudo-likelihood (P) and maximum likelihood ()), samle size and simulation scenarios. 21

22 PSEUDO Methods QUASI Emirical efficiency slash 1000 t student 1000 slash 500 t student 100 slash 100 t student 100 t student 100 slash 500 t student 500 slash 1000 t student 1000 slash 100 t student 100 slash 500 t student 500 slash 1000 t student 1000 slash Samle size Figure 6: Emirical efficiency for regression arameters by estimation methods (quasi-likelihood (Q), seudo-likelihood (P) and maximum likelihood ()), samle size and simulation scenarios. 22

23 A Days Rainfall B Rainfall Frequency C Year Rainfall SUMMER WINTER SPRING D Season Rainfall Figure 7: Time series lot for Curitiba rainfall data with fitted values (A). Vertical black lines indicate January 1st. Histogram of daily rainfall for the whole eriod (B). Boxlots for year (C) and season (D). 5.1 Smoothing time series of rainfall in Curitiba, Paraná, Brazil This examle concerns daily rainfall data in Curitiba, Paraná State, Brazil. The data were collected for the eriod from 2010 to 2015 corresonding to 2191 days. The main goal is to smooth the time series to hel us better see atterns or trends. The analysis of rainfall data is in general challenged by the resence of many zeroes and the highly right-skewed distribution of the data. The lots shown in Fig. 7 illustrate some of these features for the Curitiba rainfall data. In articular, Fig. 7(B) highlights the right-skewed distribution and the considerable roortion of exact 0s (51%). In order to smooth the Curitiba rainfall time series, we fitted a Tweedie regression model with linear redictor exressed in terms of B-slines (de Boor; 1972). The natural basis regression smoothing framework was used to select the degree of smoothness (Wood; 2006). In that case, we found that 14 degrees of freedom were enough to smooth the times series. The models were fitted by using the three estimation methods, namely, maximum likelihood (), quasi-(q) and seudo- 23

24 likelihood (P). Table 1 resents estimates and standard errors for the disersion and ower arameters. Table 1: Disersion and ower arameter estimates and standard errors (SE) by estimation methods for the Curitiba rainfall data. Estimation methods Parameter Q P Estimate SE Estimate SE Estimate SE δ The results in Table 1 show slightly different estimates for the disersion and ower arameters, deending on the estimation method used. However, the confidence intervals obtained by the Q and P aroaches contain the. The standard errors obtained by the alternative aroaches are larger than the ones obtained by the. To evaluate the effect of the estimation methods on the regression coefficients, Fig. 8 shows estimates and confidence intervals for each regression coefficient by estimation methods. The scales were standardized for each arameter dividing the estimate and the limits of the confidence interval by the estimate obtained by the maximum likelihood method. The results in Fig. 8 show that the Q method resented estimates and confidence intervals more similar to the than the P method. The relative average difference between the and Q estimates was 3.36%. On the other hand, the relative average difference between the and P estimates was 14.58%. Similarly, the confidence intervals obtained by the Q method were on average 3.33% wider than the corresonding intervals. On the other hand, the confidence intervals obtained by the P aroach were 39.98% wider than the intervals. For all estimation methods, the ower arameter estimates are in the interval 1 < < 2, suggesting a comound Poisson distribution, as exected, since the resonse variable is continuous with exact 0s. The fitted values and 95% confidence interval obtained by the quasi-likelihood method are shown in Fig. 7 above. The fitted values obtained by the and P aroaches were similar the ones obtained by the Q (not shown). The smooth function catures the swing in the data and highlights the seasonal behaviour with dry and wet months around the winter and summer seasons, resectively. In order to comare the comutational times required by each aroach for fitting the Tweedie regression model for this data set, we used the ackage rbenchmark (Kus- 24

25 Estimation methods P Q β 13 β 12 β 11 β 10 β 9 β 8 β 7 β 6 β 5 β 4 β 3 β 2 β 1 β Standardized scale Figure 8: Regression arameter estimates and 95% confidence intervals by estimation methods for the Curitiba rainfall data. 25

26 nierczyk; 2012). The comutations were done by a standard ersonal comuter at 2.90 GHz with 8 G RAM by using the R software version for ten relications. The results showed that the Q aroach is 37 and 0.22 times faster than the and P aroaches, resectively. 5.2 Income dynamics in Australia We consider some asects of a cross-section study on earnings of 595 individuals for the year 1982 in Australia. The data set is available in the ackage AER (Kleiber and Zeileis; 2008) for the statistical software R. The resonse variable wage is known to be highly-right skewed. The data set has 12 covariates: exerience years of fulltime work exerience; weeks weeks worked; occuation factor two levels (whitecollar, blue-collar); industry factor two levels (no;yes) indicating if the individual work in a manufacturing industry; south factor two levels (no;yes) indicating if the individuals resides in the south; smsa factor two levels (no;yes) indicating if the individual resides in a standard metroolitan area; gender factor indicating gender (male, female); union factor two levels (no, yes) indicating if the individual s wage set by a union contract; ethnicity factor indicating ethnicity, African-American (afam) or not (other). The main goal of the investigation was to assess the effect of the covariates on the wage. We fitted the Tweedie regression model with linear redictor comosed by all covariates by using the three estimation methods. Table 2 shows the estimates and standard errors for the regression, disersion and ower arameters. The results in Table 2 show that the and Q aroaches strongly agree in terms of estimates and standard errors for the regression coefficients. The P aroach resents estimates slightly different from the and Q aroaches. Regarding the disersion arameters, although the slightly difference in terms of estimates and standard errors, the confidence intervals from the Q and P aroaches contain the estimates. Concerning the effect of the covariates the and Q aroaches agree that the covariates weeks and south are non-significant. On the other hand, the P aroach also indicated that the covariates industry and married are nonsignificant. Regarding the other covariates the three aroaches agree that they are significant. In order to comare the fit of Tweedie regression model with more standard aroaches, we also fitted the Gaussian, gamma and inverse Gaussian regression models for the income dynamics data set. The maximized values of the log-likelihood function were , and for the Gaussian, gamma and in- 26

27 Table 2: Regression, disersion and ower arameter estimates and standard errors (SE) by estimation methods for the income dynamics data. Estimation methods Parameter Q P Estimate SE Estimate SE Estimate SE Intercet exerience weeks occuation industry south smsa married gender union education ethnicity δ verse Gaussian models, resectively. Furthermore, the maximized value of the loglikelihood function for the Tweedie regression model was , which in turn shows the better fit of the Tweedie regression model, as exected. In terms of comutational time for this data set, the Q aroach was 45 and 0.15 times faster than the and P aroaches, resectively. 5.3 Gain in weight of rats The third examle concerns to a standard Gaussian regression model. The goal of this examle is to show that the quasi- and seudo-likelihood aroaches can estimate values of the ower arameter between 0 and 1, where the maximum likelihood estimator does not exist. We used the weightgain data set available in the HSAUR ackage (Everitt and Hothorn; 2015). This data set corresonds to an exeriment to study the gain in weight of rats fed on four different diets, distinguished by the amount of rotein (low and high) and by source of rotein (beef and cereal). The data set has 40 observations. We fitted the Gaussian, gamma, inverse Gaussian and Tweedie regression models 27

28 for the weightgain data set. The linear redictor was comosed of the two main covariates source and tye along with the interaction term, for all models. The values of the maximized log-likelihood were , , and for the Gaussian, gamma, inverse Gaussian and Tweedie models, resectively. These results showed that the Gaussian distribution rovides the best fit for this data set, judging by the maximized log-likelihood value. In that case, the method is not able to indicate the best fit. It is due to the non-trivial restriction on the ower arameter sace. Thus, we fitted the model using the aroaches Q and P. Table 3 resents the estimates and standard errors for the regression, disersion and ower arameters, obtained by, Q and P aroaches. Table 3: Regression, disersion and ower arameter estimates and standard errors (SE) by estimation methods for the gain in weight of rats data. Estimation methods Parameter Q P Estimate SE Estimate SE Estimate SE Intercet source tye source:tye δ The results in Table 3 show that the three aroaches strongly agree in terms of estimates and standard errors for the regression coefficients. The value of the ower arameter was estimated smaller than 1 by the Q and P aroaches, as exected, since the Gaussian distribution rovides the best fit for this data. On the other hand, the maximum likelihood method estimated the ower arameter close to 1 the border of the arameter sace, in that case a non-otimum model. All aroaches resented large standard errors for the ower and disersion arameters. In terms of comutation time, for this alication the P aroach was 94 and 0.15 times faster than the and Q aroaches, resectively. 6 Discussion In this aer, we adoted the quasi- and seudo-likelihood aroaches to estimation and inference of Tweedie regression models. These aroaches emloy merely second- 28

29 moments assumtions, allowing to extend the Tweedie regression models to the class of quasi-tweedie regression models, which in turn offer robust and flexible models to deal with continuous data. Characteristics such as symmetry or asymmetry, heavy tailed and excess 0s are easily handled because of the flexibility of the model class. These features indicate that the Tweedie model is a otential useful tool for the modeling of continuous data. The main advantage in ractical terms, is that we have one model for virtually all kinds of continuous data. Thus, model selection is done automatically when fitting the model. The main advantages of the alternative estimation aroaches in relation to the orthodox maximum likelihood method are their easy imlementation and comutational seed. Furthermore, by emloying only second-moment assumtions, we eliminated the non-trivial restriction on the arameter sace of the ower arameter, becoming the fitting algorithm simle and efficient. It also allows us to aly the Tweedie regression models for symmetric and heavy tailed data, as the cases of Gaussian and t-student data, where in general the ower arameter resents negative and to 0 values. Another otential alication of Tweedie regression model is for the analysis of left-skewed data, where we also exect negative values for the ower arameter. The theoretical develoment in Section 3 showed that the quasi-likelihood aroach has much in common with the orthodox maximum likelihood method. The quasi-score function emloyed in the context of quasi-likelihood estimation coincides with the score function for Tweedie distributions, which also imlies that it will coincide for all exonential disersion models. The asymtotic variance of the quasi-likelihood estimators for the regression arameters coincide with the asymtotic variance of the maximum likelihood estimators, in the case of known ower and disersion arameters. Hence, the quasi-likelihood aroach rovides asymtotic efficient estimation for the regression arameters. Furthermore, the quasi-likelihood aroach as used in this aer combining the quasi-score and Pearson estimating functions, resents the insensitivity roerty (see Eq. 9) which is an analogue to the orthogonality roerty in the context of maximum likelihood estimation. The insensitive roerty allows us to aly the two stes Newton scoring algorithm, using two searate equations to udate the regression and disersion arameters. A similar rocedure can be used in the maximum likelihood framework, since the vectors β and λ are orthogonal. In the context of quasi-likelihood estimation, in this aer, we used the unbiased Pearson estimating function to estimation of the ower and disersion arameters. The discussion about efficiency in that case is difficult, since we cannot obtain a closed form for the Fisher information matrix. The fact that the sensitivity and variability matrices associated with the disersion arameters 29

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Extended Poisson-Tweedie: properties and regression models for count data

Extended Poisson-Tweedie: properties and regression models for count data Extended Poisson-Tweedie: properties and regression models for count data arxiv:1608.06888v2 [stat.me] 11 Sep 2016 Wagner H. Bonat and Bent Jørgensen and Célestin C. Kokonendji and John Hinde and Clarice

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

The Poisson Regression Model

The Poisson Regression Model The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when

More information

Slash Distributions and Applications

Slash Distributions and Applications CHAPTER 2 Slash Distributions and Alications 2.1 Introduction The concet of slash distributions was introduced by Kafadar (1988) as a heavy tailed alternative to the normal distribution. Further literature

More information

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators S. K. Mallik, Student Member, IEEE, S. Chakrabarti, Senior Member, IEEE, S. N. Singh, Senior Member, IEEE Deartment of Electrical

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

STK4900/ Lecture 7. Program

STK4900/ Lecture 7. Program STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES DEPARTMENT OF ECONOMICS ISSN 1441-549 DISCUSSION PAPER /7 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES ZuXiang Wang * & Russell Smyth ABSTRACT We resent two new Lorenz curve families by using the basic

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach Background GLM with clustered data A fixed effects aroach Göran Broström Poisson or Binomial data with the following roerties A large data set, artitioned into many relatively small grous, and where members

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Generalized Coiflets: A New Family of Orthonormal Wavelets

Generalized Coiflets: A New Family of Orthonormal Wavelets Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT Key arameters in seudo-static analysis of iles in liquefying sand Misko Cubrinovski Deartment of Civil Engineering, University of Canterbury, Christchurch 814, New Zealand Keywords: ile, liquefaction,

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Journal of Sound and Vibration (998) 22(5), 78 85 VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Acoustics and Dynamics Laboratory, Deartment of Mechanical Engineering, The

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Developing A Deterioration Probabilistic Model for Rail Wear

Developing A Deterioration Probabilistic Model for Rail Wear International Journal of Traffic and Transortation Engineering 2012, 1(2): 13-18 DOI: 10.5923/j.ijtte.20120102.02 Develoing A Deterioration Probabilistic Model for Rail Wear Jabbar-Ali Zakeri *, Shahrbanoo

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

One-way ANOVA Inference for one-way ANOVA

One-way ANOVA Inference for one-way ANOVA One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

Wolfgang POESSNECKER and Ulrich GROSS*

Wolfgang POESSNECKER and Ulrich GROSS* Proceedings of the Asian Thermohysical Proerties onference -4 August, 007, Fukuoka, Jaan Paer No. 0 A QUASI-STEADY YLINDER METHOD FOR THE SIMULTANEOUS DETERMINATION OF HEAT APAITY, THERMAL ONDUTIVITY AND

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS The Seventh Asia-Pacific Conference on Wind Engineering, November 8-1, 009, Taiei, Taiwan A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS M.F. Huang 1,

More information

Extended Poisson Tweedie: Properties and regression models for count data

Extended Poisson Tweedie: Properties and regression models for count data Extended Poisson Tweedie: Properties and regression models for count data Wagner H. Bonat 1,2, Bent Jørgensen 2,Célestin C. Kokonendji 3, John Hinde 4 and Clarice G. B. Demétrio 5 1 Laboratory of Statistics

More information

INTRODUCING THE SHEAR-CAP MATERIAL CRITERION TO AN ICE RUBBLE LOAD MODEL

INTRODUCING THE SHEAR-CAP MATERIAL CRITERION TO AN ICE RUBBLE LOAD MODEL Symosium on Ice (26) INTRODUCING THE SHEAR-CAP MATERIAL CRITERION TO AN ICE RUBBLE LOAD MODEL Mohamed O. ElSeify and Thomas G. Brown University of Calgary, Calgary, Canada ABSTRACT Current ice rubble load

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

arxiv: v4 [math.st] 3 Jun 2016

arxiv: v4 [math.st] 3 Jun 2016 Electronic Journal of Statistics ISSN: 1935-7524 arxiv: math.pr/0000000 Bayesian Estimation Under Informative Samling Terrance D. Savitsky and Daniell Toth 2 Massachusetts Ave. N.E, Washington, D.C. 20212

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Decoding First-Spike Latency: A Likelihood Approach. Rick L. Jenison

Decoding First-Spike Latency: A Likelihood Approach. Rick L. Jenison eurocomuting 38-40 (00) 39-48 Decoding First-Sike Latency: A Likelihood Aroach Rick L. Jenison Deartment of Psychology University of Wisconsin Madison WI 53706 enison@wavelet.sych.wisc.edu ABSTRACT First-sike

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

Maxisets for μ-thresholding rules

Maxisets for μ-thresholding rules Test 008 7: 33 349 DOI 0.007/s749-006-0035-5 ORIGINAL PAPER Maxisets for μ-thresholding rules Florent Autin Received: 3 January 005 / Acceted: 8 June 006 / Published online: March 007 Sociedad de Estadística

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables High-dimensional Ordinary Least-squares Projection for Screening Variables Xiangyu Wang and Chenlei Leng arxiv:1506.01782v1 [stat.me] 5 Jun 2015 Abstract Variable selection is a challenging issue in statistical

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES EXACTLY ERIODIC SUBSACE DECOMOSITION BASED AROACH FOR IDENTIFYING TANDEM REEATS IN DNA SEUENCES Ravi Guta, Divya Sarthi, Ankush Mittal, and Kuldi Singh Deartment of Electronics & Comuter Engineering, Indian

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Research of power plant parameter based on the Principal Component Analysis method

Research of power plant parameter based on the Principal Component Analysis method Research of ower lant arameter based on the Princial Comonent Analysis method Yang Yang *a, Di Zhang b a b School of Engineering, Bohai University, Liaoning Jinzhou, 3; Liaoning Datang international Jinzhou

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS 4 th International Conference on Earthquake Geotechnical Engineering June 2-28, 27 KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS Misko CUBRINOVSKI 1, Hayden BOWEN 1 ABSTRACT Two methods for analysis

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

An Outdoor Recreation Use Model with Applications to Evaluating Survey Estimators

An Outdoor Recreation Use Model with Applications to Evaluating Survey Estimators United States Deartment of Agriculture Forest Service Southern Research Station An Outdoor Recreation Use Model with Alications to Evaluating Survey Estimators Stanley J. Zarnoch, Donald B.K. English,

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components ATHES UIVERSITY OF ECOOMICS AD BUSIESS DEPARTMET OF ECOOMICS WORKIG PAPER SERIES 23-203 The ower erformance of fixed-t anel unit root tests allowing for structural breaks in their deterministic comonents

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA) Unsuervised Hyersectral Image Analysis Using Indeendent Comonent Analysis (ICA) Shao-Shan Chiang Chein-I Chang Irving W. Ginsberg Remote Sensing Signal and Image Processing Laboratory Deartment of Comuter

More information

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system

More information

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models

More information

Projected Principal Component Analysis. Yuan Liao

Projected Principal Component Analysis. Yuan Liao Projected Princial Comonent Analysis Yuan Liao University of Maryland with Jianqing Fan and Weichen Wang January 3, 2015 High dimensional factor analysis and PCA Factor analysis and PCA are useful tools

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information

Asymptotic F Test in a GMM Framework with Cross Sectional Dependence

Asymptotic F Test in a GMM Framework with Cross Sectional Dependence Asymtotic F Test in a GMM Framework with Cross Sectional Deendence Yixiao Sun Deartment of Economics University of California, San Diego Min Seong Kim y Deartment of Economics Ryerson University First

More information