Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset

Size: px
Start display at page:

Download "Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset"

Transcription

1 Submitted to the Brazilian Journal of Probability and Statistics Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset Introduction Clécio da Silva Ferreira a, Filidor Vilca b and Heleno Bolfarine c a Universidade Federal de Juiz de Fora b Universidade Estadual de Campinas c Universidade de São Paulo Abstract. The skew-normal distribution has been used successfully in various statistical applications. The main purpose of this paper is to consider local influence analysis, which is recognized as an important step of data analysis. Motivated to simplify expressions of the conditional expectation of the complete-data log-likelihood function, used in the EM algorithm, diagnostic measures are derived from the case-deletion approach and the local influence approach inspired by Zhu et al. () and Zhu and Lee (). Finally, the results obtained are applied to a dataset from a study to evaluate quality of life (QOL) and to identify its associated factors in climacteric women with a history of breast cancer. Linear symmetrical models have been investigated by various authors. For example, Lange et al. (989) presented an approach to model Student-t distributions, while Galea-Rojas et al. (997) and Liu () discussed diagnostic methods for symmetrical linear regression models. Although the class of symmetric distributions is a better alternative than that of the normal distribution, it is not appropriate in situations where the sample distribution is asymmetrical. For example, Hill and Dixon (98) discussed and presented examples with asymmetrical structures. From a practical viewpoint, many authors have transformed variables to achieve normality, and in many cases their methods are satisfactory. However, Azzalini and Capitanio (999) pointed out some problems, mentioning, for example, that the transformed variables are more difficult to deal with and to interpret. A family of skew-normal distributions that accommodates practical values of skewness and kurtosis, and includes the normal distribution as a special case, was introduced by Azzalini (985). From then on, many authors have considered these distributions in different areas, such as economics, oceanography, engineering and biomedical sciences, among others. Azzalini (5) presented a discussion on skew-normal distributions with applications in regression models. Also, Keywords and phrases. Case-deletion; Local influence; Skew-normal distribution; Approach; EM algorithm.

2 C. S. Ferreira et al. Lachos et al. (7) considered an application of diagnostics analysis in linear mixed models, and in the context of errors-in-variables models some results can be found in Lachos et al. (8). Sahu et al. (3) proposed a new skew-normal distribution, equivalent to that of Azzalini (985), in which the EM algorithm used to obtain the maximum likelihood estimates is simpler in the M-step. Consequently, some of its properties and applications can be easily derived, such as diagnostics analysis based the conditional expectation of the complete-data log-likelihood; see Zhu and Lee () and Zhu et al. (). Some applications of the skew-normal of Sahu et al. (3) can be found in Azevedo et al. (), Lee and McLachlan (3) and Dagne (6), among others. After the model is fitted, diagnostics analysis is a key step in data analysis subsequent to parameter estimation. This procedure can be carried out by conducting an influence analysis to detect influential observations. There are two usual approaches to detect influential observations. The first is the case-deletion approach, which has received a great deal of attention due to the paper by Cook (977), which that presented an intuitively appealing method. Since then, the Cook distance or the likelihood distance have been applied to many statistical models. This approach allows assessment of the individual impact of cases in the estimation process (see, Cook and Weisberg, 98). The second approach, which is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs, is the local influence approach, due to Cook (986). This method has received considerable attention recently in the statistical literature on regression models. Both approaches can be studied on the basis of the Q-function considered by Zhu et al. () () to introduce the case-deletion measures and by Zhu and Lee () to study local influence diagnostics as can be seen in Ferreira et al. (5), Massuia et al. (5), and Zeller et al. (4). Although several diagnostic studies on regression models have appeared in the literature, no application of local influence has been considered for regression models under the skew-normal distribution of Sahu et al. (3) with emphasis on the case-deletion approach and local influence. Based on the Q-function, which is closely related to the conditional expectation of the completedata log-likelihood in the E-step of the EM algorithm, we develop a study of local influence following the approach of Zhu and Lee (), which produces results very similar to those obtained from Cook s method, but with simpler application. We also develop methods to obtain case-deletion measures by using the method of Zhu et al. (). Following Sahu et al. (3), we say that an random variable Y has an univariate skew-normal distribution with location parameter µ, scale parameter σ and skewness parameter δ, if its

3 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset 3 probability density function (pdf) is given by f(y) = ( y µ σ +δ φ )Φ σ +δ ( δ σ ) y µ, (.) σ +δ where φ(.) and Φ(.) are the pdf and cumulative distribution function (cdf) of the N(,) distribution, respectively. This distribution is denoted by Y SN(µ,σ,δ). It is easy to see that for δ =, the pdf in (.) reduces to the normal distribution. The mean and variance of Y are ( E(Y) = µ+ π δ and Var(Y j) = σ + ) δ, (.) π respectively. The stochastic representation is given by Y d = µ+δ X +σx, where X and X are independent with distribution N(,). The notation d = means that both variables have the same distribution. The paper is organized as follows. Section introduces the skew-normal linear regression model and describes an EM algorithm for obtaining the maximum likelihood (ML) estimates. Section 3 provides a brief sketch of the case-deletion and local influence approaches for models with incomplete data, and also develops a method pertinent to linear regression models under the skew-normal distribution. Moreover, the generalized leverage is discussed. Section 4 illustrates the method with analysis of two examples involving life quality of women with breast cancer and simulation studies. Finally, some concluding remarks are made in Section 5. The skew-normal linear regression model In this section we present the linear regression models under the skew-normal distribution (SN- LRM). Specifically, we consider the linear regression model under the skew-normal distribution proposed by Sahu et al. (3), as follows: Y j = x j β +δu j +ε j, (.) where x j = (,x j,...,x jp ), β = (β,β,...,β p ), ε j N(,σ ) and U j HN(,), j =,...,n, all independent, where HN(,) denotes the standard half-normal distribution. So, Y j SN(x j β,σ,δ) with pdf give by ( yj x ( f(y j ;θ) = j β δ y σ +δ φ j x )Φ ) j β, (.) σ +δ σ σ +δ where φ(.) and Φ(.) are as in (.). The mean and variance of Y j are given by E(Y j ) = x j β + π δ and Var(Y j) = σ +( π )δ, (.3)

4 4 C. S. Ferreira et al. respectively. The log-likelihood function for θ = (β,σ,δ), given y,...,y n, is given by l(θ) = nlog n log(π) n log(σ +δ ) (σ +δ ) n (y j x j β) + j= n logφ(b j ), (.4) j= δ where B j = σ σ +δ (y j x j β). After some algebraic manipulations, the information matrix denoted by I F (θ) is given by I F (θ) = [I γψ ], γ,ψ = β,σ,δ, whose components are given by ] I ββ = [+ δ σ +δ σ a (δ/σ) X X, [ δ c I βσ = (σ +δ ) 3/ (σ +δ ) / σ +δ ] σ 3 a (δ,σ) X, [ σ δ ] I βδ = (σ +δ ) 3/ c (σ +δ ) / δa (δ,σ) +σa (δ/σ) X, [ I σ σ = n (σ +δ ) + (σ +δ ) δ ] 4σ 6 a (δ/σ), [ nδ I σ δ = (σ +δ ) (σ +δ ] ) σ a (δ/σ), [ ] n I δδ = (σ +δ ) δ +σ a (δ/σ), where a (δ,σ) = a (δ/σ) δ σ a (δ/σ) δ a σ (δ/σ), a (δ,σ) = δ σ a (δ/σ)+a (δ/σ), c = /π and a hk (x) = E[Z h W k Φ (xz)], with a (x) = c(x + ) / and a (x) = c(x + ) 3/, and X is the model matrix. The other values a ( ) and a ( ) are obtained through the approximation given in Rodríguez and Branco (7). Next, we present the Q-function as an alternative to the observed-data log-likelihood function, which is associated with the conditional expectation of the complete-data log-likelihood function in the EM algorithm. The use of the Q-function will be useful to detect influential observations in regression models when the errors follow a skew-normal distribution because it is significantly simpler. This idea is inspired by the works of Zhu and Lee () and Zhu et al. ().. The Q-function and the EM algorithm We implement the EM algorithm introduced by Dempster et al. (977), in which the M-step involves only the Q-function complete data, which in this case is computationally simple. The model (.) can be expressed by the hierarchical representation Y j U j = u j N(x j β +δu j,σ ) and U j HN(,). (.5) Hence U j Y j = y j HN ( δ σ +δ (y j x j β), σ σ +δ ). Now, letting y = (y,...,y n ) and u = (u,...,u n ), treating u as missing data, we have the complete-data log-likelihood function

5 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset 5 associated with y c = (y,u ) is given by l c (θ y c ) = n j= l c(θ y j,u j ), where (without the additive constant) l c (θ y j,u j ) = logσ σ (y j x j β) + δ σ u j(y j x j β) δ σ u j. Given the current estimate θ (k) = ( β (k), σ (k), δ (k) ) of θ at the kth iteration, the E-step calculates the conditional expectation of the complete-data log-likelihood function or simply the Q-function n Q(θ θ (k) [ ) = E l c (θ y j,u j ) y j,θ = θ (k)] n = Q j (θ θ (k) ), (.6) where j= Q j (θ θ (k) ) = C log σ (k) σ (k) ǫ j j= (k) δ(k) δ + (k)û(k) j ǫ (k) j (k) (k) σ σ (k)û j, with ǫ (k) j = y j x β(k) (k) j, and the quantities û j = E(U j y j,θ = θ (k) ) and û(k) j = E(Uj y j,θ = θ (k) ) are obtained from properties of the half-normal distribution û (k) j = û (k) j = δ (k) σ (k) + σ (k) j + δ (k) ǫ(k) σ (k) δ + ǫ σ (k) + δ (k) ( σ (k) + δ j (k) ) ( σ (k) + δ W Φ (B (k) (k) ) / j ), (.7) (k) (k) + δ(k) σ (k) ( σ (k) + δ (k) ) 3/ ǫ (k) j W Φ (B (k) j ), (.8) where B j is as in (.4), j =,...,n and W Φ (u) = φ(u)/φ(u). Thus, we have the following EM-algorithm: E-step: Given θ = θ (k), compute û (k) j and û(k) j for j =,...,n, using (.7) and (.8). M-step: Update θ (k+) by maximizing Q(θ θ (k) ) over θ, which leads to the following closed form expressions β (k+) = (X X) X y δ (k) (X X) X û (k), σ (k+) = (k) [ ǫ ǫ (k) δ (k) û (k) ǫ (k) + δ n n (k) δ (k+) = (k) û(k) ǫ n j=û(k) j, û (k) j j= where û (k) = (û (k),...,û(k) n ), ǫ (k) = ( ǫ (k),..., ǫ(k) n ) and X is the model matrix. ], 3 Diagnostics analysis Influence diagnostic techniques are used to identify anomalous observations that impact model fit or statistical inference for the assumed statistical model. In the literature, there are primarily two

6 6 C. S. Ferreira et al. approaches to detect influential observations. The case-deletion approach of Cook (977) is the most popular one for identifying influential observations. For assessing the impact of influential observations on parameter estimates, some metrics have been used to measure the distance between θ [j] and θ, such as the likelihood distance and Cook s distance. The second approach is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs proposed by Cook (986). We study here the case-deletion measures and the local influence diagnostics for linear regression models on the basis of the Q-function inspired by the results of Zhu et al. (), Zhu and Lee () and Lee and Xu (4). We first consider the case-deletion measures, then the local influence measures based on some perturbation schemes and generalized leverage. 3. Case-deletion measures Case-deletion is a classic approach to study the effects of dropping the jth case from the dataset. Let y c = (y,u) be the augmented dataset, where a quantity with a subscript [j] denotes the original one with the jth observation deleted. The complete-data log-likelihood function based on the data with the jth case deleted is denoted by l c (θ y c[j] ). Let θ [j] = ( β [j], σ [j], δ [j] ) be the [ maximizer of the function Q [j] (θ θ) = E l c (θ Y c[j] ) y [j],θ = θ ], where θ = ( β, σ, δ) is the ML estimate of θ, and the estimates θ [j] are obtained by using the EM algorithm based on the remaining n observations, with θ as the initial value. To assess the influence of the ith case on θ, we compare θ [j] with θ. If θ [j] is far from θ in some sense, then the ith case is regarded as influential. As θ [j] is needed for every case, the required computational effort can be quite heavy, especially when the sample size is large. Hence, to calculate the case-deletion estimate θ [j] of θ, Zhu et al. () proposed the following one-step approximation based on the Q-function: θ [j] = θ + { Q( θ θ) } Q [j] ( θ θ), (3.) where Q( θ θ) = Q(θ θ)/ θ θ and Q θ= θ [j] ( θ θ) = Q [j] (θ θ)/ θ are the Hessian θ= θ matrix and the gradient vector evaluated at θ, respectively. The Hessian matrix is an essential element in the method developed by Zhu and Lee () to obtain the measures for casedeletion diagnosis. To develop the case-deletion measures, we have to obtain the elements in (3.), Q [j] ( θ θ) and Q( θ θ), which can be obtained quite easily from (.6). The vector Q [j] ( θ θ) =

7 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset 7 ( Q [j]β ( θ θ), Q [j]σ ( θ θ), Q [j]δ ( θ θ)) has its elements given by Q [j]β ( θ θ) = σ X [j] ( ǫ [j] δû [j] ), Q [j]σ ( θ θ) = σ [ n++ σ ( ǫ [j] ǫ [j] δû [j] ǫ [j] + δ Q [j]δ ( θ θ) = σ (û [j] ǫ [j] δ n i=,i j û i ). n i=,i j Moreover, the Hessian matrix Q( θ θ) and its inverse matrix are respectively given by Q( θ θ) = Q Q Q Q where K = Q Q Q Q, Q ( θ) = σ X X and Q ( θ θ) = n σ û i )], Q + K Q Q Q Q K Q Q Q K Q /K, Q ( θ) = σ X û and Q ( θ) = ṋ σ û, (3.) withû = (û,...,û n ) andû denoting the mean of the components of vectorû = (û,...,û n). Inspired by the classic case-deletion measures, Cook s distance and the likelihood displacement, Zhu et al. () and Lee and Xu (4) presented analogous measures based on the Q-function. These measures are: (i) Generalized Cook distance: This measure is defined similar to the usual Cook distance, which is based on the genuine likelihood GD j = ( θ [j] θ) { Q( θ θ) } ( θ [j] θ). (3.3) Upon substituting (3.) into (3.3), we obtain the approximate distance given by GD j = Q [j] ( θ θ) { Q( θ θ) } Q [j] ( θ θ); (ii) Q-distance: This measure of the influence of the ith case is based on the Q-distance function, similar to the likelihood distance LD j discussed by Cook (977), defined as QD j = { Q( θ θ) Q( θ [j] θ) }. (3.4) We can calculate an approximation of the likelihood displacement QD j by substituting (3.) into (3.4), resulting in the following approximation QD j of QD j: QD j = { Q( θ θ) Q( θ [j] θ) }.

8 8 C. S. Ferreira et al. If the interest is to consider the influence of the ith case on some subset of parameters, then it can be obtained quite easy as follows: Let θ = (θ,θ ), where here θ = (β,σ ) is the parameter of the usual regression model and θ = δ is the skewness parameter. From the expressions of Q [j] ( θ θ) and Q( θ θ), the generalized Cook distance for the parameters (β,σ ) and δ can be defined as follows: () Both β and σ are the parameters of interest and δ is the nuisance parameter GD j (β,σ ) = Q [j]θ ( θ θ) [I p+,] { Q( θ θ) } [Ip+,] Q [j]θ ( θ θ), (3.5) where Q [j]θ ( θ θ) = ( Q [j]β ( θ θ), Q [j]σ ( θ θ) ), with θ = (β,σ ) ; () δ is the parameter of interest and both β and σ are nuisance parameters GD j (δ) = Q [j]δ ( θ θ) b { Q( θ θ) } b Q [j]δ ( θ θ) (3.6) = ( Q [j]δ ( θ θ) ) b { Q( θ θ) } b, where Q [j]δ ( θ θ) is the third element of Q [j] ( θ θ) and b is the (p+) vector with one at (p+)th position. 3. The local influence approach Consider a perturbation vector ω varying in an open region Ω R q. Let l c (θ,ω y c ), θ R p be the complete-data log-likelihood of the perturbed model. We assume there is a ω such that l c (θ,ω y c ) = l c (θ y c ) for all θ. Let θω be the maximizer of the function Q(θ,ω θ) = E[l c (θ,ω y c ) y, θ]. Then the influence graph is defined as α(ω) = (ω,f Q (ω)), where f Q (ω) is the Q-displacement function defined as [ ) )] f Q (ω) = Q( θ θ Q( θω θ. Following the approach developed in Zhu and Lee (), the normal curvature C fq,d, of α(ω) at ω in the direction of a unit vector d can be used to summarize the local behavior of the Q-displacement function. It can be shown that } C fq,d(θ) = d Qω d = d ω { Q( θ θ) ω d where Q( θ θ) = Q(θ θ)/ θ θ and θ= θ ω = Q(θ,ω θ)/ θ ω. As in Cook { Q( θ θ)} θ= θω (986), the expression Qω = ω ω is fundamental to detect influential observations. It may be of interest to assess the influence on a subset θ of θ = (θ,θ ). For

9 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset 9 example, one may be interested in θ = β or θ = δ. In such situations, the curvature in the direction d is given by C fq,d(θ ) = d ω ( Q( θ θ) B ) ω d, (3.7) where B =, Q ( θ θ) and Q ( θ θ) is obtained from the partition of Q( θ θ) according to the partition of θ. A clear picture of Qω (a symmetric matrix) is given by its spectral decomposition Qω o = n λ k e k e k, k= where (λ,e ),...,(λ n,e n ) are the eigenvalue-eigenvector pairs of the matrix Qω o, with λ... λ q > λ q+ =... = λ n = and e,...,e n are elements of the associated orthonormal basis. Zhu and Lee () proposed to inspect all eigenvectors corresponding to nonzero eigenvalues for more revealing information, but this can be computationally intensive for large n. Following Zhu and Lee () and Lu and Song (6), we consider an aggregated contribution vector of all eigenvectors corresponding to nonzero eigenvalues. Starting with some notation, let λ k = λ k /(λ +...+λ q ), e k = (e k,...,e kn ) and M() = q λ k e k. k= Hence, the assessment of influential cases is based on {M() l,l =,...,n} and one can obtain M() l via B fq,u l = u l Qω u l /tr[ Qω ], where u l is a column vector in R n with the lth entry equal to one and all other entries zero. Refer to Zhu and Lee () for other theoretical properties of B fq,u l, such as invariance under reparameterization of θ. Additionally, Lee and Xu (4) propose to use /n +c SM() as a benchmark to regard the lth case as influential, where c is an arbitrary constant (depending on the real application) and SM() is the standard deviation of {M() l,l =,...,n}. Finally, following the idea of Verbeke and Molenberghs (), we consider another important direction given by d = d kq which corresponds to a q vector with one in the kth position and all other entries zero. In that case, the normal curvature is called the total local influence of kth observation, which is given by { Q( θ θ)} C k = d kq ω ω d kq, k =,...,q. (3.8)

10 C. S. Ferreira et al. Moreover, the kth term is an influential observation if C k is larger than the cutoff value c = q k= C k/q. Now, we are ready to evaluate the matrix ω under three different perturbation schemes: case weights perturbation, response perturbation and explanatory perturbation. Each perturbation scheme has the partitioned form of ω = ( β, σ, δ ). (i) Case weight perturbation Let ω = (ω,...,ω n ) be an n dimensional vector with ω = (,...,) R n. Then, consider the following arbitrary allocation of weights for the Q-function, which may capture departures in general directions Q(θ,ω θ) = n k= ω kq k (θ θ). In this case, the matrix ω is given by X D( ǫ) ω = σ σ ǫ D( ǫ) n where D(a) denotes the diagonal matrix of vector a. (ii) Response perturbation δx D(û) + σ δ σ ǫ D(û) δ ǫ D(û)+ δû σ û, (3.9) A perturbation of the response variables Y = (Y,...,Y n ) is introduced by replacing Y k by Y kω = Y k +ω k S y, where S y is the standard deviation of Y. The perturbed Q-function is given by Q(θ,ω θ), switching Y kω with Y k and ω = (ω,...,ω n ). Under this perturbation scheme the vector ω = and the matrix ω is expressed as (iii) Explanatory perturbation X ω = S y σ σ ǫ + S y σ δ û σ û. (3.) We are interested in perturbing a specific explanatory variable. Under this condition we have the perturbed explanatory variable x tω = x t + S t ω, t {,...,p}, where S t is the standard deviation of the explanatory variable x t and ω =. The perturbed Q-function is given by Q(θ,ω θ), switching x tω with x t. The (p+) n matrix ω is given by e p (t) ǫ ω = S β t X δe p (t)û t σ β t σ ǫ + S t β t σ σ δû, (3.) β t û

11 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset where β t is the tth element of β, e p (t) is a p vector with one in the tth position and zeros elsewhere. 3.3 Generalized leverage Another concept that has been useful in the development of diagnostics in linear regression is the leverage. The main idea behind the concept of leverage is that of evaluating the influence of y j on its own predicted value (see Wei et al., 998). This influence can be represented by the derivative ŷ j / y j. Under usual linear regression model, ŷ j / y j = h jj that is the jth principal diagonal element of the projection matrix H = X(X X) X. Inspired by the generalized leverage proposed by Wei et al. (998), we consider a generalized leverage matrix for models with incomplete data defined by GL( θ) = D θ [ Q( θ θ)] Qθ,y ( θ), (3.) where D θ = µ/ θ, Qθ,y ( θ) = Q(θ θ)/ θ y θ= θ and Q( θ θ) is the Hessian matrix. In the linear regression models under skew-normal distribution, the expectation of Y is given by E(Y ) = µ(θ) = Xβ+ π δ n, then Ŷ = µ( θ) is the predicted response vector. In this case, we have that D θ = [X, n, c n ], Qθ,y ( θ) is as in (3.) without the term S y and Q θ ( θ θ) is given in (3.). We usec p /n,p = tr(gl( θ)), as a benchmark to regard the jth case as a leverage point, where c is a selected constant (depending on the real application). So, observations with values GL jj ( θ) > c p /n are considered leverage points. 4 Numerical application In this section, a simulation study and a real example are presented to illustrate the performance of the developed method. First, we carry out a numerical illustration with simulated data. Finally, we analyze real data to illustrate the usefulness of the proposed method. 4. Simulation study In this section, we examine the performance of the developed method based on simulated data. We consider a dataset of size n =,5 from linear regression models defined in (4.). That is, y j = β +β x j +ǫ j, (4.) where the true values of the parameters are given by β = 5., β =., σ =. and δ =., where x j is generated from the uniform distribution U(,), j =,...,n. For each sample,

12 C. S. Ferreira et al. Table Simulated and adjusted values for model (4.), with n =, with standard errors in parentheses. Parameters Simulated Adjusted Model A Model B Model C β (.5) 5.(.5) 6.6(.5) β..(.).(.).9(.7) σ..9(.4).8(.4).4(.) δ..98(.).6(.) -.57(.6) Table Simulated and adjusted values for model (4.) with n = 5, with standard errors in parentheses. y Parameters Simulated Adjusted Model A Model B Model C β (.7) 4.98(.7) 6.4(.3) β..(.9).(.9).98(.) σ..(.3).9(.).46(.5) δ..(.6).8(.6) -.4(.4) x (a) x Figure Simulated data with n = (a) and n = 5 (b). Scatterplot with adjusted line for Models A (solid line), B (dashed line) and C (dashdot line). we choose an observation and disturb it in the response variable. So, we consider three linear regression models: Model A: model without perturbation; Model B: y B (ind) = a y(ind); Model C: y C (ind) = b y(ind), where a > and < b <. For n =, a =.4, b =.5 and ind = 58; for n = 5, a =.7, b =.5 and ind = 38. Table shows the maximum likelihood (ML) estimates for the y parameters of the three models. In Model B, the estimate of β is slightly lower than that of Model A, while the estimate of δ is slightly higher. However, as ŷ j = x β+ δ, j π the fitted line is very similar to that in Model A. In Model C, the differences of the estimates for the parameters β, σ and δ are more pronounced, but the fitted line is also similar to that in Model A, as can be seen in Figure (b) 38

13 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset M() 5 99 M() M() M() (a) (b) (c) M() M() (d) (e) (f) Figure Simulated data with n =. Model B: Case weight perturbation (a), Response perturbation (b) and Explanatory perturbation (c). Model C: Case weight perturbation (d), Response perturbation (e) and Explanatory perturbation (f). M() M() M() M() (a) (b) (c) 38 M() x M() x (d) (e) (f) Figure 3 Simulated data with n = 5. Model B: Case weight perturbation (a), Response perturbation (b) and Explanatory perturbation (c). Model C: Case weight perturbation (d), Response perturbation (e) and Explanatory perturbation (f). Local influence analysis under three perturbation schemes and generalized leverage measures for simulated data are described next. Forn =, Model A yielded observations #49 and #99 as influential in case weight perturbation, and #43, #49 and #99 under response and explanatory variables. Observations #, # and #3 were classified as leverage points; see Figure. Model

14 4 C. S. Ferreira et al. B indicated observations #58 and #99 are influential for the three perturbation schemes; see Figures (a)-(c). For the three perturbation schemes under Model C, only observation #58 is detected; see Figures (d)-(f). Following the same way, we have similar results for n = 5, with observation #38 as influential, as presented in Figures 3 (a)-(c) and 3 (e)-(f). The results do not change for large samples size, and are omitted. 4. Real dataset A dataset on life quality of women with breast cancer obtained by the Center for Integral Attention to Women s Health (State University of Campinas, Brazil) was adjusted to the skew-normal regression model and the diagnostic analysis approach was applied. The index of life quality was evaluated by the Medical Outcome Study 36-item Short-Form Health Survey (SF-36) questionnaire. These indexes condense eight components into two: a physical component summary and a mental component summary. Conde et al. (5) analyzed this dataset, evaluating the associated factors of the life quality of women with breast cancer. The response variable is the physical component summary of the life quality index (pcs) and the explanatory variables are the indicator variable dizziness and the body mass index (bmi) of the individual. In this study, we discusse these data based on the skew-normal linear regression model pcs j = β +β dizziness j +β bmi j +e j, j =,...,97, (4.) where e j are independent such that e j SN(,σ,δ). 4.. Goodness-of-fit and estimation: The ML estimates of parameters and their corresponding standard deviations (calculated using the observed information matrix) are reported in Table 3, which shows that the estimates of β in both models are similar. In contrast, the estimates of the parameter σ are different. We have constructed the QQ-plots and envelopes in Figure 4 (lines represent the 5th percentile, the mean and the 95th percentile of simulated points for each observation) based on rj = (y j x j β), which has an approximate χ distribution. The random σ + δ variable r j also enables us to check the model and detect the presence of outlying observations. The simulated envelope graph plotted to validate the skew-normal regression model indicates no points lie outside the confidence bounds (Figure 4b). On the other hand, the simulated envelope for the normal linear regression model (Figure 4a) indicates some exterior points. Figure 5 displays the scatter plot of data and the adjusted lines for each status of dizziness, showing a negative relationship between pcs and bmi. Figure 5(b) shows the histogram of the

15 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset 5 Sample values and simulated envelope Sample values and simulated envelope Theoretical χ quantiles (a) Theoretical χ quantiles Figure 4 Simulated envelope for quality of life dataset. (a) Adjusted under normal linear model and (b) Adjusted under skew-normal linear model. Table 3 ML estimates of the life quality dataset for normal and skew-normal linear regression models. Parameter SN Normal Estimate SE Estimate SE β β β σ δ Log-likelihood AIC model waste set (4.), indicating the presence of asymmetric data. Based on the normal approximation of the ML estimates, the 95% confidence interval for δ (CI: δ±.96se) is [ 6.56, 7.4], which does not include the value zero. That is, a skew-normal model is more suitable than a normal linear model. A similar result is obtained when testing the null hypothesis H : δ = against the alternative H : δ when we use the likelihood-ratio (LR) statistic, which has a χ distribution under H. In this case, LR = (l( θ) l( θ )) = 3.88, with a p-value of Influence diagnostics analysis First, we identify outlying observations under the fitted model based on case-deletion measures, the Q-distance and generalized Cook distance. The results are reported in Figure 6. We note from these figures that observations #4 and #73 are potentially influential on the parameter estimates. These refer to women without dizziness with low life quality index. Observation #65 shows a slight influence and represents a woman with high life quality index. We also used the generalized Cook distance when the interest is a subset (b)

16 6 C. S. Ferreira et al With dizziness Without dizziness With dizziness Without dizziness pcs Density bmi (a) Pearson residuals Figure 5 Quality of life dataset. (a) Scatter plot and adjusted lines, (b) Pearson residuals of the parameters. All results were similar, so they are not shown here to save space. GD (a) 73.5 QD (b) Figure 6 Quality of life dataset. (a) Generalized Cook distance, (b) Q-Distance Second, we conducted the local influence diagnostics analysis to detect outlying observations, using the perturbation schemes discussed previously and the generalized leverage. For the perturbation schemes we obtained the values of M() and the figures present the index graphs of M(). The horizontal lines delimit the Lee and Xu (4) benchmark for M(), with c = 3. So, under the perturbation schemes, observations #, #4 and #73 are potentially influential on the parameter estimates. These results are reported in Figure 7 and Figure 8a. Note that observations #4 and #73 are potentially influential under all perturbation schemes, which correspond to the smallest values of pcs for women without dizziness. On the other hand, observation # is influential under case weight perturbation and response perturbation. This observation is a woman with smaller values of pcs and with dizziness. Moreover, we performed diagnostics analysis based on C i measures defined in (3.8), and the influential observations were the same observations #4 and #73. We then used the generalized leverage and observations #5, # (b) 65 73

17 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset M(). M() (a) Figure 7 Quality of life dataset. Diagnostics of case weight perturbation (a) and response variable perturbation (b) (benchmark with c = 3). M() (a) 73 GL (b) Figure 8 Quality of life dataset. (a) Diagnostic of bmi variable perturbation (benchmark with c = 3) and (b) Generalized leverage. and #3 were indicated as influential, as can be seen in Figure 8b. These observations are more distant from the mass data (right), which are women without dizziness and high bmi. It is important to note that these observations were not detected using the perturbation schemes, so the usefulness of the generalized leverage can be seen. (b)

18 8 C. S. Ferreira et al C fq (δ).5 C fq (δ) C fq (δ) C fq (β) C fq (β) (a) C fq (β) (c) 4 73 C i (δ) 9 x (b) C (β) i Figure 9 Quality of life dataset. Diagnostic of influence in β and δ (benchmark with c = 3): (a) Case weight perturbation (b) Response variable (c) bmi variable and (d) C i for case weight. Finally, we also discuss the influence of some observations on the estimates of β and δ based on C fq,d(β) and C fq,d(δ), as defined in (3.7). For the three perturbation schemes, Figures 9(a)-(c) show the scatter plots of C fq,d(β) versus C fq,d(δ). From these figures, we observe that #73 is potentially influential on both parameter estimates β and δ, regardless of the perturbation scheme. Moreover, Figure 9d shows that observation #3 is found to have relatively large C i (β) and C i (δ), which corresponds to the woman without dizziness and high bmi. 5 Conclusion In this paper, we have provided a diagnostics analysis for skew-normal regression models based on the approaches proposed by Zhu et al. () and Zhu and Lee (). We have proposed a procedure for computing case-deletion measures and local influence diagnostics on the basis of the conditional expectation of the complete-data log-likelihood function in relation to the EM algorithm and also we have considered a generalized leverage. Case weight, response and explanatory perturbations are considered. Explicit expressions are obtained for the Hessian matrix Q( θ θ) and for the matrix ω under different perturbation schemes. The generalized leverage devel- (d) 73 3

19 Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset 9 oped here is a necessary supplement to the results obtained based on the perturbation schemes. This diagnostic measure helps to detect observations that were not detected by perturbation schemes. References Azevedo, C. L. N., Bolfarine, H., Andrade, D. F.,. Bayesian inference for a skew-normal irt model under the centred parameterization. Computational Statistics and Data Analysis 55, Azzalini, A., 985. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, Azzalini, A., 5. The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics 3, Azzalini, A., Capitanio, A., 999. Statistical applications of the multivariate skew-normal distribution. Journal of the Royal Statistical Society 6, Conde, D. M., Pinto-Neto, A., Cabello, C., Santos-Sá, D., Costa-Paiva, C., Martinez, E., 5. Quality of life in brazilian breast cancer survivors age years: associated factors. The Breast Journal, Cook, R. D., 977. Detection of influential observation in linear regression. Technometrics 9, 5 8. Cook, R. D., 986. Assessment of local influence. Journal of the Royal Statistical Society, Series B 48, Cook, R. D., Weisberg, S., 98. Residuals and influence in regression. Chapman & Hall/CRC, Boca Raton, FL. Dagne, G. A., 6. Bayesian segmental growth mixture tobit models with skew distributions. Comput Stat 3, 37. Dempster, A., Laird, N., Rubin, D., 977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39 (), 38. Ferreira, C. S., Lachos, V. H., Bolfarine, H., 5. Inference and diagnostics in skew scale mixtures of normal regression models. Journal of Statistical Computation and Simulation 85 (3), Galea-Rojas, M., Paula, G. A., Bolfarine, H., 997. Local influence in elliptical linear regression models. The Statistician 46, Hill, M. A., Dixon, W. J., 98. Robustness in real life: a study of clinical laboratory data. Biometrics 38, Lachos, V. H., Bolfarine, H., Arellano-Valle, R. B., Montenegro, L. C., 7. Likelihood based inference for multivariate skew-normal regression models. Communications in Statistics Theory and Methods 36, Lachos, V. H., Montenegro, L. C., Bolfarine, H., 8. Inference and influence diagnostics for skew-normal null intercept measurement errors models. Journal of Statistical Computation and Simulation 78, Lange, K. L., Little, R., Taylor, J., 989. Robust statistical modeling using t distribution. Journal of the American Statistical Association 84, Lee, S. X., McLachlan, G. J., 3. On mixtures of skew normal and skew t-distributions. Adv Data Anal Classif 7, Lee, S. Y., Xu, L., 4. Influence analysis of nonlinear mixed-effects models. Computational Statistics and Data Analysis 45, Liu, S. Z.,. On local influence for elliptical linear models. Statistical Papers 4, 4.

20 C. S. Ferreira et al. Lu, B., Song, X.-Y., 6. Local influence of multivariate probit latent variable models. Journal of Multivariate Analysis 97 (8), Massuia, M. B., Cabral, C. R. B., Matos, L. A., Lachos, V. H., 5. Influence diagnostics for student-t censored linear regression models. Statistics: A Journal of Theoretical and Applied Statistics 49 (5), Rodríguez, C. L. B., Branco, M. D., 7. Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Brazilian Journal of Probability and Statistics, Sahu, S. K., Dey, D. K., Branco, M. D., 3. A new class of multivariate distributions with applications to bayesian regression models. The Canadian Journal of Statistics 3, 9 5. Verbeke, G., Molenberghs, G.,. Linear mixed models for longitudinal data. Springer, New York. Wei, B. C., Qu, Y. Q., Fung, W. K., 998. Generalized leverage and its applications. Scandinavian Journal of Statistics 5, Zeller, C. B., Lachos, V. H., Vilca, F. V., 4. Influence diagnostics for grubbs s model with asymmetric heavytailed distributions. Stat Papers 55, Zhu, H., Lee, S.,. Local influence for incomplete-data models. Journal of the Royal Statistical Society, Series B 63, 6. Zhu, H., Lee, S., Wei, B., Zhou, J.,. Case-deletion measures for models with incomplete data. Biometrika 88 (3), Clécio da Silva Ferreira clecio.ferreira@ufjf.edu.br Heleno Bolfarine hbolfar@ime.usp.br Filidor Vilca fily@ime.unicamp.br

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Francisco José A. Cysneiros 1 1 Departamento de Estatística - CCEN, Universidade Federal de Pernambuco, Recife - PE 5079-50

More information

Modified slash Birnbaum-Saunders distribution

Modified slash Birnbaum-Saunders distribution Modified slash Birnbaum-Saunders distribution Jimmy Reyes, Filidor Vilca, Diego I. Gallardo and Héctor W. Gómez Abstract In this paper, we introduce an extension for the Birnbaum-Saunders (BS) distribution

More information

Bias-corrected Estimators of Scalar Skew Normal

Bias-corrected Estimators of Scalar Skew Normal Bias-corrected Estimators of Scalar Skew Normal Guoyi Zhang and Rong Liu Abstract One problem of skew normal model is the difficulty in estimating the shape parameter, for which the maximum likelihood

More information

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS Reinaldo B. Arellano-Valle Héctor W. Gómez Fernando A. Quintana November, 2003 Abstract We introduce a new family of asymmetric normal distributions that contains

More information

Likelihood Based Inference for Quantile Regression Using the Asymmetric Laplace Distribution

Likelihood Based Inference for Quantile Regression Using the Asymmetric Laplace Distribution Likelihood Based Inference for Quantile Regression Using the Asymmetric Laplace Distribution Luis B. Sánchez Victor H. Lachos Filidor V. Labra Departamento de Estatística, Universidade Estadual de Campinas,

More information

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS Pak. J. Statist. 2016 Vol. 32(2), 81-96 INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS A.A. Alhamide 1, K. Ibrahim 1 M.T. Alodat 2 1 Statistics Program, School of Mathematical

More information

Empirical Bayes Estimation in Multiple Linear Regression with Multivariate Skew-Normal Distribution as Prior

Empirical Bayes Estimation in Multiple Linear Regression with Multivariate Skew-Normal Distribution as Prior Journal of Mathematical Extension Vol. 5, No. 2 (2, (2011, 37-50 Empirical Bayes Estimation in Multiple Linear Regression with Multivariate Skew-Normal Distribution as Prior M. Khounsiavash Islamic Azad

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

A model of skew item response theory

A model of skew item response theory 1 A model of skew item response theory Jorge Luis Bazán, Heleno Bolfarine, Marcia D Ellia Branco Department of Statistics University of So Paulo Brazil ISBA 2004 May 23-27, Via del Mar, Chile 2 Motivation

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA

SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA By Lingsong Zhang, Haipeng Shen and Jianhua Z. Huang Purdue University, University of North Carolina,

More information

The Log-generalized inverse Weibull Regression Model

The Log-generalized inverse Weibull Regression Model The Log-generalized inverse Weibull Regression Model Felipe R. S. de Gusmão Universidade Federal Rural de Pernambuco Cintia M. L. Ferreira Universidade Federal Rural de Pernambuco Sílvio F. A. X. Júnior

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

Package ARCensReg. September 11, 2016

Package ARCensReg. September 11, 2016 Type Package Package ARCensReg September 11, 2016 Title Fitting Univariate Censored Linear Regression Model with Autoregressive Errors Version 2.1 Date 2016-09-10 Author Fernanda L. Schumacher, Victor

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

YUN WU. B.S., Gui Zhou University of Finance and Economics, 1996 A REPORT. Submitted in partial fulfillment of the requirements for the degree

YUN WU. B.S., Gui Zhou University of Finance and Economics, 1996 A REPORT. Submitted in partial fulfillment of the requirements for the degree A SIMULATION STUDY OF THE ROBUSTNESS OF HOTELLING S T TEST FOR THE MEAN OF A MULTIVARIATE DISTRIBUTION WHEN SAMPLING FROM A MULTIVARIATE SKEW-NORMAL DISTRIBUTION by YUN WU B.S., Gui Zhou University of

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS REVSTAT Statistical Journal Volume 3, Number 1, June 2005, 1 18 ROBUSTNESS OF TWO-PHASE REGRESSION TESTS Authors: Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, São

More information

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Likelihood ratio testing for zero variance components in linear mixed models

Likelihood ratio testing for zero variance components in linear mixed models Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University,

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference An application to longitudinal modeling Brianna Heggeseth with Nicholas Jewell Department of Statistics

More information

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

T E C H N I C A L R E P O R T KERNEL WEIGHTED INFLUENCE MEASURES. HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE

T E C H N I C A L R E P O R T KERNEL WEIGHTED INFLUENCE MEASURES. HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE T E C H N I C A L R E P O R T 0465 KERNEL WEIGHTED INFLUENCE MEASURES HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE * I A P S T A T I S T I C S N E T W O R K INTERUNIVERSITY ATTRACTION

More information

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County A Generalized p-value Approach to Inference on Common Mean Pranab K. Mitra and Bimal K. Sinha Department of Mathematics and Statistics, University Of Maryland, Baltimore County 1000 Hilltop Circle, Baltimore,

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview Abstract This chapter introduces the likelihood function and estimation by maximum likelihood. Some important properties

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model

A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Journal of Data Science 3(2005), 137-151 A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Joseph E. Cavanaugh 1 and Junfeng Shang 2 1 University of

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

Robust Linear Mixed Models with Skew Normal Independent Distributions from a Bayesian Perspective

Robust Linear Mixed Models with Skew Normal Independent Distributions from a Bayesian Perspective Robust Linear Mixed Models with Skew Normal Independent Distributions from a Bayesian Perspective Victor H. Lachos a1, Dipak K. Dey b and Vicente G. Cancho c a Departamento de Estatística, Universidade

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S ECO 513 Fall 25 C. Sims CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 1. THE GENERAL IDEA As is documented elsewhere Sims (revised 1996, 2), there is a strong tendency for estimated time series models,

More information

Identifiability problems in some non-gaussian spatial random fields

Identifiability problems in some non-gaussian spatial random fields Chilean Journal of Statistics Vol. 3, No. 2, September 2012, 171 179 Probabilistic and Inferential Aspects of Skew-Symmetric Models Special Issue: IV International Workshop in honour of Adelchi Azzalini

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Estimation: Problems & Solutions

Estimation: Problems & Solutions Estimation: Problems & Solutions Edps/Psych/Soc 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline 1. Introduction: Estimation

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Box-Cox Transformations

Box-Cox Transformations Box-Cox Transformations Revised: 10/10/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 5 Plot of Fitted Model... 6 MSE Comparison Plot... 8 MSE Comparison Table... 9 Skewness

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

Empirical likelihood for linear models in the presence of nuisance parameters

Empirical likelihood for linear models in the presence of nuisance parameters Empirical likelihood for linear models in the presence of nuisance parameters Mi-Ok Kim, Mai Zhou To cite this version: Mi-Ok Kim, Mai Zhou. Empirical likelihood for linear models in the presence of nuisance

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

The Use of Copulas to Model Conditional Expectation for Multivariate Data

The Use of Copulas to Model Conditional Expectation for Multivariate Data The Use of Copulas to Model Conditional Expectation for Multivariate Data Kääri, Meelis; Selart, Anne; Kääri, Ene University of Tartu, Institute of Mathematical Statistics J. Liivi 2 50409 Tartu, Estonia

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data

A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data Jianhua Z. Huang a, Min Chen b, Mehdi Maadooliat a, Mohsen Pourahmadi a a Department of Statistics, Texas A&M

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J.

SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J. SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS " # Ping Sa and S.J. Lee " Dept. of Mathematics and Statistics, U. of North Florida,

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information