Analysis of 2 n factorial experiments with exponentially distributed response variable: a comparative study

Size: px
Start display at page:

Download "Analysis of 2 n factorial experiments with exponentially distributed response variable: a comparative study"

Transcription

1 Analysis of 2 n factorial experiments with exponentially distributed response variable: a comparative study H. V. Kulkarni a and S. C. Patil (Birajdar) b Department of Statistics, Shivaji University, Kolhapur, India, 464. a Department of Statistics, Pad.Dr.D.Y.Patil A.C.S.College,Pimpri, Pune, India, 48. b ABSTRACT The use of generalized linear models (GLM) is suggested for the analysis of designed experiments involving non-normal response variables and small size samples. This paper assesses and compares the performance of GLM with respect to log transformation and ordinary least squares (OLS) approaches on the basis of coverage results and expected length of confidence intervals for the expected responses for exponentially distributed response variable. We also focus on the relevant power functions of tests of hypothesis for testing significance of underlying factorial effects. Although, the comparison is mainly simulation based, whenever possible, theoretical investigation is also carried out for saturated 2-level factorial experiments. The coverage results of log transformation approach are uniformly superior to GLM for two replications (very small sample) which is most frequently encountered situation in the industial experiments. We also suggest a modification to GLM based CI that shows decisive improvement over the GLM. t-test under GLM approach exhibits best power function for testing significance of factorial effects but t-test under log transformation approach is also equally best. Keywords: confidence intervals; ordinary least squares; generalized linear models; log transformation; factorial experiments;.introduction and Preliminaries. Introduction The optimality properties of OLS depend on normality and homogenous assumptions. Certainly there are many situations, where the non-normal response and nonhomogeneous variance is a fact. These responses may be related non-linearly with the input factors and the variance is not constant but is a function of mean. Moreover, only small and moderate size sample is available due to practical constraints. Traditionally in such situations, a *Correspondence kulkarnih@rediffmail.com

2 suitably transformed response variable (most frequently a variance stabilizing transformation) is analyzed using OLS and the results are transformed back to the original scale. But this approach has many drawbacks, see for example, Myers and Montgomery [3]. GLM, introduced by Nelder and Wedderburn [] used as an analog to the above mentioned traditional approaches. Myers and Montgomery [3] gave a tutorial on GLM, Lewis et.al.[4] gave many examples of designed experiments with non-normal responses and Lewis et.al.[5] attempted a simulation study on analysis of designed experiments involving non-normal response variables using GLM. The use of GLM is suggested for the analysis of designed experiments involving non-normal response variables and small size samples. All the above discussions are mainly concentrated on estimating confidence intervals (CI) for the expected responses. The authors compared all above mentioned approaches on the basis of response estimation and prediction which is only limited to a sample of parameter combinations. This does not seem to be adequate for assessing the overall performance of GLM. In industrial settings, sometimes the response variable could have an exponential distribution. The emphasis of this paper is to assess and compare the performance of GLM with respect to transformation and OLS approaches on the basis of coverage results and expected length of CI(E(LOCI)) for the expected responses for exponentially distributed response variable. We also focus on the power functions of relevant tests of hypothesis for testing significance of underlying factorial effects. Although, the comparison is mainly simulation based, whenever possible, theoretical investigation is also carried out which covers a reasonable representative subset of the parameter space representing a variety of situations that could be encounter in practice. We also suggest a modification to CI for GLM that shows decisive improve- 2

3 ment over the GLM approach at the prescribed confidence coefficient. For transformation approach, we consider the log transformation, which is variance stabilizing transformation and is commonly used for lifetime data and refer it as LOG approach henceforth..2 Nature of the underlying design matrix The set up discussed here is that of saturated 2-level factorial designs with r(> ) replications. The number of parameters (m) in this case is equal to the number of distinct treatment combinations (k) under which maximum likelihood estimators (m.l.e.), under the GLM setup take a closed form expression that enables the theoretical comparison of three approaches. Note that, the underlying model can be viewed as a regression model with design matrix (cf. Myers and Montgomery [3]), X n k = ( A,..., A ) () where n = rk and A is a k k symmetric Hadamard matrix corresponding to a single replicate of the experiment. The matrix A being a Hadamard matrix, we have A = k A (2) From () and the properties of matrix A, it follows that, X X = rki k and XX is an rk rk matrix consisting of r r array of k k submatrices, where each submatrix equals ki k, I k being an identity matrix of order k. Throughout this paper, we assume that, the response variable Y is an exponentially distributed random variable. Y ij, i =, 2,..., k, j =, 2,..., r are observations on Y. W ij = log(y ij ) is the log transformed response, Ȳ i. = rj= Y r ij, Sy 2 = ki= S 2 k i where Si 2 = rj= (Y r ij Ȳi.) 2 and W i. and Sw 2 are similarly defined. 3

4 The remainder of the paper is organized as follows: In Section 2, our discussion is concern about the theoretical details of above three approaches; the underlying model, expressions for estimated responses, confidence limits and corresponding lengths of CI (LOCI) around expected responses for the underlying model. We provide exact theoretical expressions for E(LOCI) and expected coverage probability for the GLM approach. Good theoretical approximations of the same are also provided for the OLS and LOG approaches. Section 3 reports the simulation procedure and findings of the Sections 2 and 3 regarding the performance of the above discussed methods in terms of E(LOCI) and coverage results. The LOG approach outperform GLM for two replications (very small sample) which is most frequently encountered situation in the industial experiments. We propose a modification to GLM based CI, that shows decisive improvement over the GLM. In Section 4, we compare the three approaches based on the power functions of tests for testing significance of factorial effects. t-test under GLM and LOG approaches exhibit best power function for testing significance of factorial effects. 2.Theoretical details of the three approaches 2. The underlying model and related formulae for various approaches Table summarizes the underlying model, estimated responses, confidence limits and LOCI around expected responses for three approaches. Here Y denotes the estimated response, µ i denotes the expected response corresponding to the i th regressor vector X i, which is i th row of the coefficient matrix X. σy 2 and σw 2 are the unknown constant variances of Y and W respectively. The components of the vector of parameters β correspond to various factorial effects. t (n k),α/2 is the upper α/2 th percentile of the 4

5 t-distribution with n k degrees of freedom (d.f.) and Z α/2 is the upper α/2 th percentile of the standard normal distribution. For the illustration, the confidence coefficient α =.95 is used throughout the paper. Table : Underlying model and related formulae for the three approaches. Method Model Response distribution Confidence Limits a and estimated response and LOCI b under the respective model OLS EY = Xβ Y ij N(µ i, σy) 2 Yi. ± t (n k),α/2 S y/r 2 a µ i = (X i β) Yij = Y i. 2t (n k),α/2 S y/r 2 b LOG W ij = log(y ij ) W ij N(µ i, σw) 2 exp( W i. ± t (n k),α/2 S 2 w /r) a EW = Xβ Y ij = exp( W i. ) ((v 2 )/v) exp ( W i. ) b µ i = (X i β) = ( Y ij ) /r = GM v = exp(t (n k),α/2 s 2 w /r) GLM EY = exp(x β) Y ij Exponential(µ i ) Yi. exp(±z α/2 / r) a ( µ i = exp (X i β) Yij = Y i. Yi. exp{z α/2 / r)} b exp{ Z α/2 / ) r)} Note that, for OLS and LOG approach the formulae in the Table follow easily using standard results on OLS and properties of Hadamard matrix. The estimated response for LOG approach is nothing but the geometric mean (GM) of the relevant group of observations. Since GM AM, it follows that, LOG approach underestimates the expected responses. In some situatons, for example, when the response variable is time to recurrence of a defective item, this could be a serious problem. For GLM approach, the model is EY = µ = g (η); η = X β = g(µ) where g(.) is an appropriate monotonic and differentiable link function. There exist two types of links; canonical and noncanonical link. The distribution 5

6 of response variable Y is assumed to be a member of the exponential family. We refer to McCullagh and Nelder [2] for more theoretical details of GLM. It is well known that, for exponentially distributed response Y, the canonical link selects the reciprocal link η i = µ i. Due to some drawbacks of this link, for example, negative values of the estimated response may be encountered, we choose the noncanonical log link, which overcomes these drawbacks. For this link, the underlying model is E(Y ij ) = µ i = exp(x iβ) i =,..., k, j =,..., r. (3) where X i is the i th row of A in the design matrix () and µ corresponds to a single replicate. In Table, (µ, µ,..., µ) = E(Y ) = exp (X β). Parameter estimation in the GLM is performed using the method of maximum likelihood estimation, which is in general an iterative process and may not yield closed form expressions of m.l.e. However, simple closed form expressions of m.l.e. are available due to the assuptions on the underlying design matrix in Section.2. Since, the matrix A is nonsingular, β is just a reparameterization of µ, giving the estimated response in closed form; Y ij = µ i = exp(x i β) = Ȳi.. Consequently β = A log(ȳi.) is also available in closed form. The CI for mean response µ i at the point x i, is given by exp(x i For the noncanonical link, σ i = β ±Z α/2 σi ). (a(φ)) 2 x i(x V X) x i (cf. Myers and Montgomery [3]) where V is the diagonal matrix whose i th diagonal element is estimate of V ar(y i ) and is the diagonal matrix whose i th diagonal element i is the derivative of ( /µ i ) with respect to X iβ. For the design matrix (), noting that, for exponential response variable a(φ) =, i = exp( X i β) and V ar(y i ) = exp(2x i β), this simplifies to σ i = / r. A further simplification leads to CI for GLM approach, Ȳi.exp(±Z α/2 / r) and the corresponding 6

7 LOCI is Ȳi.(exp(Z α/2 / r) exp( Z α/2 / r)) which are reported in the last column of Table. The E(LOCI) is E(LOCI) = µ i (exp(z α/2 / r) exp( Z α/2 / r)) which at α =.5, for r=2 and 3 replications equals, { µi for r=2 E(LOCI) = 2.778µ i for r=3 (4) Note that, both OLS and GLM estimate the expected response by Ȳi. which is an unbiased estimator of µ i but as noted earlier, LOG approach under estimates µ i. 2.2 Approximate Expected Lengths of CI for OLS and LOG approaches In this section we attempt to obtain approximate theoretical expressions of E(LOCI) for OLS and LOG approaches. i)e(loci) for OLS approach For OLS approach, LOCI = 2t (n k),α/2 S 2 y /r where, Sy 2 = (/k) k i= Si 2, Si 2 = rj= (Y r ij Ȳi.) 2. Noting that, Si 2 is an unbiased estimator of µ 2 i, it follows that, E(LOCI) = 2t (n k),α/2 µ /r, where µ is the average of µ 2 i, i =, 2,..., k. At α =.5, for r=2 and 3 replications this equals, { µ for r=2 E(LOCI) = 3.26 µ for r=3 (5) ii)e(loci) for LOG approach For LOG approach, as reported in Table, LOCI = ((v 2 )/v) exp ( W i. ) where v = exp(t (n k),α/2 s 2 w/r). Note that, E(Sw) 2 = and delta method leads to E(exp ( W i. )) = (µ r Γ i r r )r. Although v and W i. need not be independently distributed, we can have approximately E(LOCI) E((v 2 )/v)e(exp ( W i. )) 7

8 = (exp(t (n k),α/2 / r) exp( t (n k),α/2 / r))(µ r i r Γ(/r))r. In particular at α =.5, r=2 and 3 replications this gives, E(LOCI) { µi for r= µ i for r=3 (6) Note that, for GLM and LOG approach the expected lengths depend only on the particular µ i under consideration while for OLS, they depend on all components of µ through µ and are same for all components of µ. In the next section we discuss the expected coverage probability of the above approaches. 2.3 Expected Coverage probability for three approaches i)coverage probability for GLM approach For GLM approach, the desired coverage probability for CI is P r(ȳi. exp( Z α/2 / r) µ i Ȳi. exp(z α/2 / r)) = P r(r exp( Z α/2 / r r) Y ij /µ i r exp(z α/2 / r) j= Since, r j= Y ij /µ i follows Gamma(, r) distribution, at α =.5 this equals G(r(7.993) / r,, r) G(r(.49) / r,, r) (7) where G(x,, r) is the cumulative distribution function at x for Gamma(, r) distribution. For r=2 it is.967 and for r=3 it equals.927. ii) Lower Bound (L.B.) for coverage probability of OLS approach The desired coverage probability for OLS approach is P r(ȳi. t (n k),α/2 S 2 y /r µ i Ȳi. + t (n k),α/2 S 2 y /r) 8

9 Note that, for exponentially distributed observations, V ar(ȳi.) = µ 2 i /r. Replacing S 2 y by E(S 2 y) = (/k) k i= µ 2 i = µ, Chebychev s inequality gives P r( µ i Ȳi. t (n k),α/2 S 2 y /r) {V ar(ȳi.)(t 2 (n k),α/2(s 2 y/r)) } = (µi 2 /t 2 (n k),α/2 µ ) This gives an approximate L.B. for coverage of OLS approach L(µ) = ( (µ i 2 /t 2 (n k),α/2 µ )) (8) The L. B. in (8) for the vectors µ used in the simulation study are listed in Tables 2 and 4, columns 6 and. iii)lower Bound for Coverage probability of LOG approach For LOG approach, referring to Table, the desired coverage probability can be approximated as P r( W i. t (n k),α/2 S 2 w/r log(µ i ) W i. + t (n k),α/2 S 2 w/r) Noting that, for LOG approach, V ar( W i. ) = /r and E(S 2 w) = arguments similar to those for OLS lead to, the approximate lower bound for coverage of LOG approach L(µ) = ( (/t 2 (n k),α/2)) (9) Unfortunately for fixed α, noting that n = rkthe L.B. in (9) is decreasing in the number of replications r and number of treatment combinations k through the degrees of freedom associated with the t-quantile. At α =.5, this equals 87.3 for r=2, k=4; 8.2 for r=3, k=4 as wel as for r=2, k=8 and for r=3, k=8. Next section attempts a comparison of three approaches based on a simulation study. 9

10 3.Simulation study 3. Back-ground and Details of the simulation study Noting the dependence of E(LOCI) for the three approaches on the expected responses, µ rather than on the individual parameters β, we compare the above approaches for various values of µ. However, E(LOCI) for OLS approach depend on all the components of the mean response vector µ, while those for GLM and LOG approaches depend only on the particular µ i under consideration, the direct theoretical comparison of three approaches is not possible. Just as the theoretical arguments are indecisive when attempting a choice among the three approaches, so too are comparing based on an extensive simulation study. The small number of replications is a common practice in industrial experiments. Hence, the simulation study is carried out for 2 2 and 2 3 experiments with r=2 and 3 replications for exponentially distributed response Y. A few sets of µ = (µ, µ 2, µ 3, µ 4 ) are selected for simulation study and the values of µ are given in the first column of Tables 2 and 4. The coefficient matrix is as defined in (). Observing that, coverage results and E(LOCI) for OLS approach depend heavily on µ 2 i and µ, (cf. equations (5) and (8), a few sets of µ with µ i ranging between to 5, values of µ = k i= µ 2 i /4 ranging between to 3 and µ 2 i spread around µ, in various patterns are selected for simulation study. For each selected µ we generated 5 vectors Y whose components are independent exponential variates with E(Y ) = µ = (µ, µ ) for r=2 replications and (µ, µ, µ ) for r=3 replications. For each of these 5 vectors Y, for each component µ i, we compute the lower confidence limit (Li), upper confidence limit (U i ), and LOCI i = U i L i, i =, 2,..., 5, for the

11 two approaches. The simulated E(LOCI) for a particular µ i is the average of these 5 LOCI and the simulated coverage is the percentage of these 5 intervals which covers the true value of µ i. Simulated coverage results and E(LOCI) for three approaches with r=2 and 3 replications for 2 2 and 2 3 experiments are obtained and are reported in Tables 2 and 3 and Tables 4 and 5 respectively. In the next section, we report observations based on the Sections 2 and General observations regarding coverage results Theoretical coverage results reported in Section 2.3 and simulated coverage results reported in Table 2 for 2 2 and in Table 4 for 2 3 experiments with r=2 and 3 replications for the three approaches revealed the following facts: a)glm approach: The simulated coverage results for GLM approach exactly conform with theoretical ones given in (7). The coverage results are reasonable and well concentrated but these are well below the desired level. These are increasing function of the number of replications r but do not depend on the number of factors. b) OLS approach: The simulated coverage results for OLS approach follow the L.B. given in (8). The coverage results are poor if µ 2 i > µ, and good otherwise. Coverage results are very good, almost % when µ 2 i is quite less than µ (But in such cases, E(LOCI) tend to be unduly large). c)log approach: Simulated coverage results for LOG approach follow the L.B. in (9) but it is not sharp due to the approximations employed for its computation. The coverage results are reasonable and well concentrated but like GLM these are also well below the desired level. These are decreasing function of the number of replications r and number of treatment combinations k. The decreasing pattern with respect to r and k is visible from the columns 5 and 9 of the Tables 2 and 4.

12 3.3 General observations regarding E(LOCI) Theoretical coverage results reported in Section 2.2 and simulated E(LOCI) reported in Table 3 for 2 2 and in Table 5 for 2 3 experiments with r=2 and 3 replications for the three approaches revealed the following facts: a) GLM approach: The simulated E(LOCI) for GLM exactly match with those given in (4). The E(LOCI) depend only on the value of the particular component µ i of µ. These are increasing function of the particular component µ i of µ. Moreover, they are decreasing function of the number of replications r but do not depend on the number of factors. b)ols approach: The simulated E(LOCI) for OLS approach depend on all components of µ and are same for all components of µ but these are smaller than the ones obtained in (5). The large components of µ greatly inflate E(LOCI) of other components through µ. For example, for a 2 2 experiment with r=2, the same value of µ i = 25.79, for µ = (25.79, 2.7, ,.738) with µ = has E(LOCI)= while for µ = (5.584,.4724, 25.79,.388) with µ = has E(LOCI)= In exactly similar manner, for µ = (7.389, 7.389, 7.389,.353), the observation.353, pulls down E(LOCI) for but results in poor coverage probability The problem basically arises since OLS uses a single pooled estimate S 2 of variability for all components. A very good approximation, E(LOCI) 3.2 µ is obtained through regression of simulated E(LOCI) that is fairly consistent with the expression for E(LOCI) obtained in (5). c)log approach: The simulated E(LOCI) for LOG approach are larger than the ones obtained in (6). Like GLM these also depend only on the value of the particular component µ i of µ and are increasing in µ i. Regression of simulated E(LOCI) on µ i for a 2 2 experiment with r=2 gave the relation E(LOCI)=3 µ i for µ i 5 and E(LOCI)= µ i for µ i > 5. This 2

13 is not very close to the expression obtained in (6), may be because of the approximations used in its computation. Similar linear relationships were observed for r=3 and 2 3 experiments. 3.4 Comparison of three approaches. The coverage results of the OLS approach are most unstable. 2. For both GLM and LOG approach the coverage results are well below the desired level. The coverage results for GLM are uniformly inferior to those of LOG for two replications and are uniformly superior to those of LOG approach for three replications. 2. The E(LOCI) for GLM are uniformly smaller than LOG approach. Thus, the LOG approach outperform GLM for two replications which is most typically encountered situation in industrial experiments but all these observations are indecisive when attempting a choice between GLM and LOG approach. In the next section, we suggest a modification to CI for GLM approach that show decisive improvement over the GLM approach at desired confidence coefficient. 3.5 Proposed modification to GLM approach The proposed modification to GLM based CI is deliberately designed to achieve the coverage probability exactly equal to the desired confidence coefficient. For the given level α, a number δ can be selected so that, P r(rexp( δz α/2 / r r) Y ij /µ i rexp(δz α/2 / r)) = ( α) j= It is found that, at α =.5 δ = {.248 for r=2.566 for r=3 (8) 3

14 The values of δ can be well approximated as a function of r as δ = r The values of δ in (8) suggest the CI limits be as Ȳi.exp(±.7296) for r=2 and Ȳi.exp(±.388) for r=3. Consequently the E(LOCI) is given by { 5.463µi for r=2 E(LOCI) = 3.436µ i for r=3 (9) Although, these expected widths are a little larger than those of GLM approach they are much smaller than those of LOG approach and maintain the coverage probability exactly equal to.95 and hence the proposed modification to GLM outperform the above methods and hence recommended for the use. This modification will be denoted by M-GLM henceforth. Panels (a), (b) of Figures and 2, display the box plots of coverage results for OLS, GLM, M-GLM and LOG approaches with r=2 and r=3 respectively based on Tables 2 and 4. The box plots of the corresponding E(LOCI) based on Tables 3 and 5 are displayed in the panels (c) and (d) of these figures. 4.Comparison of three approaches based on the powers of the tests for significance of factorial effects In this section, we compare the three approaches based on the power functions of tests for testing significance of factorial effects. 4. The tests under consideration Note that, for OLS and LOG approach, the regression coefficients in the respective models are scalar multiples of the factorial effects that are commonly used in the analysis of factorial experiments, so that, testing significance of the regression coefficients using t-test is equivalent to testing significance of the underlying factorial effect with usual F-test. For GLM approach two types of tests can be employed: t-test and deviance test. 4

15 In all the three approaches, t statistic for the hypotesis H : β i = is given by, t = ˆβ i /SE( ˆβ i ), carries n k d.f. and H is rejected if t > t (n k),α/2. Under OLS, ˆβ i are linear functions of observations of the type λ y and SE( ˆβ i ) is given by ˆσ λ 2 i where ˆσ is estimated by square root of residual sum of squares. For LOG a parallel procedure is adopted with Y replaced by the log transformed variable W. For GLM approach ˆβ i is m.l.e. of β i obtained through an iterative procedure (it has closed form expression for the saturated model given in Section 2.) and its standard error is the square root of i th diagonal element of the inverse information matrix, given by ((a(φ)) 2 (X V X) ) which for exponential response under saturated model of Section 2 equals /n. The deviance test for GLM is based on the difference between the deviances for the two nested models; the full model and the reduced model that is obtained under H by eliminating the i th column of the coefficient matrix X. The deviance of a fitted model is defined as D = 2[lnL(f itted model) lnl(saturated model)](a(φ)) For testing H : β i =, the deviance statistic takes the form F = D(β i β (i) )/(df 2 df ) D(β)/df 2 For exponentially distributed response k D(β) = 2r( lnȳi.+k), i= k r ( ) D(β i β (i) ) = 2 (y ij ˆµ i )/ ˆµ i ln(y ij / ˆµ i ) i= j= and β (i) is the vector β with i th component removed and ˆµ i is the m.l.e of µ i for the reduced model. These four test procedures henceforth will be denoted by t-ols, t-log, t-glm and DEV respectively. 5

16 4.2 The Power functions The power functions of the above four test procedures were simulated for a 2 2 factorial experiment with r=2 and 3 replications for each of the three hypotheses H : β =, H 2 : β 2 =, H 3 : β 2 =, which respectively correspond to the two main effects and the interaction between them. Power functions of each of these three hypotheses were exactly same, hence, here we have presented only those for H : β =. Noting that, the models under OLS and GLM are different and the coefficient β i represents respective effect on a linear scale for OLS and on logarithmic scale for GLM and LOG approaches, the power functions are generated under the following three situations, () Fixing β = (β, β 2, β 2 ) at (,, ) and (,, 2) and varying β under GLM model for a 2 2 experiment with r=2 and 3 replications. For this, 5 simulations each of Y exponential(µ) were generated from µ = exp(x β), with β varying systematically between -4 to 4 with increment of.. The power under consideration is simulated as the proportion of times the respective test rejects H among these 5 simulations. The corresponding Power functions are plotted in Figures 3(a) and 3(b) for two replications and in Figures 3(c) and 3(d) for three replications. (2)Fixing (β, β 2, β 2 ) at (6.,, ) and (7., 2, ) and varying β under OLS model as above. The corresponding power functions are plotted in Figures 4(a) and 4(b) for two replications and in Figures 4(c) and 4(d) for three replications. Here the component β had to be adjusted so as to ensure µ = Xβ non-negative. (3)For a few sets of µ considered in the first column of Table 2, taking β = A log(µ) for GLM, β varying systematically as above. The Power functions of these tests are plotted in Figures 5(a) and 5(b) respectively for 6

17 µ = [25.793, 2.7, ,.738] and µ = [.9375,.875,.625,.25] for two replications and similarlly in 5(c) and 5(d) for three replications. 4.3 Observations based on Power functions and recommendations From all the simulated Power functions it is clear that, ()t-log and t-glm are having best power functions among the four tests and have small type-i error rates, GLM being a little more superior to LOG. (2)The power functions of t-ols are very small while the Type-I error rates are large. (3)The deviance test is quite less powerful but has small type-i error rates. These observations suggest that t-test under LOG is equally best as under GLM approach and hence both are recommended for testing significance of factorial effects. The OLS has worst performance with respect to expected coverage, length of CI and power in the entire parameter space and hence should be strictly avoided. Further study is demanded regarding the unsaturated models and is in the progress. REFERENCES [] P. Feigl and M. Zelen, Estimation of exponential survival probabilities with concomitant information, Biometrics. 2, (965), pp [2] P. McCullagh and J. Nelder, Generalized Linear Models (2 nd ed.), Chapman and Hall, London, (989). [3] R. Myers and D. Montgomery, A Tutorial on Generalized Linear Models, Journal of Quality Technology 29(3) (997), pp [4] S. Lewis, D. Montgomery, and R. Myers, CI Coverage for Designed Experiments Analyzed with GLMs, Journal of Quality Technology 33(3) (2), pp [5] S. Lewis, D.Montgomery, and R. Myers, Examples of Designed Experiments with Nonnormal Responses, Journal of Quality Technology 33(3) (2), pp

18 Table 2: Simulated coverage results for OLS, GLM, M-GLM and LOG approaches for 2 2 factorial experiment with r=2, 3 replications µ OLS GLM M-GLM LOG r=2 r=3 L.B. Cov_OLS OLS GLM M -GLM LOG L.B. Cov_OLS

19 Table 3: Simulated E(LOCI) for OLS, GLM, M-GLM and LOG approaches for 2 2 factorial experiments with r=2, 3 replications µ r=2 r=3 OLS GLM M-GLM LOG OLS GLM M-GLM LOG

20 Table 4: Simulated Coverage results for OLS, GLM, M-GLM and LOG approaches for 2 3 factorial experiments with r=2, 3 replications µ r=2 r=3 L.B. COV_OLS OLS GLM M-GLM LOG L.B. COV_OLS OLS GLM M-GLM LOG

21 Table 5: Simulated E(LOCI) for OLS, GLM, M-GLM and LOG approaches for 2 3 factorial experiments with r=2, 3 replications µ r=2 r=3 OLS GLM M-GLM LOG OLS GLM M-GLM LOG

22 Fig. : Box Plots for Coverage results and E(LOCI) for a 2 2 factorial experiments for OLS, GLM, M-GLM and LOG Approaches (a) COVERAGE for r =2 (b) COVERAGE for r =3 COVERAGE COVERAGE GLM LOG M_GLM METHOD OLS GLM LOG M_GLM METHOD OLS 4 (c) E(LO CI) for r =2 4 (d) E(LO CI) for r =3 E(LOCI) 3 2 E(LOCI) 3 2 G LM LO G M _G LM M ETH O D O LS G LM LO G M _G LM METH O D O LS Fig. 2: Box Plots for Coverage results and E(LOCI) for a 2 3 factorial experiment for OLS, GLM, M-GLM and LOG Approaches (a) C O V E R A G E for r =2 (b) C O V E R A G E for r =3 COVERAGE COVERAGE G LM LO G M _G LM M ETH O D O LS 6 G LM LO G M _G LM M ETH O D O LS ( c) E(LOCI) for r =2 ( d) E(LOCI) for r =3 E(LOCI) E(LOCI) GLM LOG M_GLM METHOD OLS GLM LOG M_GLM METHOD OLS 22

23 Figure 3: Graphs of power functions for H: β = for various parameter combinations β =( β, β 2, β 2 ) when β is systematically varied under GLM model. power (a) β = (,, ), r= power β (b) β = (,, 2), r= t-ols t-log t-glm DEV β t-ols t-log T-GLM DEV POWER POWER (c) β = (,, ), r= t-ols t-log t-glm DEV β (d) β = (,, 2), r= β t-ols t-log t-glm DEV Figure 4: Graphs of power functions for H: β = for various parameter combinations β =( β, β 2, β 2 ) when β is systematically varied under OLS model. (a) β = (6.,, ), r=2 (c) β = (6.,, ), r=3 Power t-ols t-log t-glm DEV β POWER β t-ols t-log t-glm DEV 23

24 Power (b) β = (7., 2, ), r= t-ols t-log t-glm DEV β power (d) β = (7., 2, ), r= t-ols t-log t-glm DEV β Figure 5: Graphs of power functions of the four tests for testing H : β = for effects β =( β, β 2, β 2 ) generated from µ selected from Table 2, varying β systematically (a) µ =[ ], r=2 Power Power t-ols t-log t-glm DEV β (b) µ = [ ], r= β t-ols t-log t-glm DEV (c) µ =[ ], r=3 PO W ER β t-ols t-log t-glm DEV (d) µ = [ ], r=3 POWER β t-ols t-log t-glm DEV 24

25 Table 6 : Design matrix and the Survival time of leukemia patients x x 2 x x 2 Survival time of leukemia patients ( Y ) Table 7: LOCI for the Real Data Set OLS GLM M-GLM LOG Y LL UL LOCI LL UL LOCI LL UL LOCI LL UL LOCI

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable Applied Mathematical Sciences, Vol. 5, 2011, no. 10, 459-476 Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable S. C. Patil (Birajdar) Department of Statistics, Padmashree

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic. Serik Sagitov, Chalmers and GU, February, 08 Solutions chapter Matlab commands: x = data matrix boxplot(x) anova(x) anova(x) Problem.3 Consider one-way ANOVA test statistic For I = and = n, put F = MS

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests Throughout this chapter we consider a sample X taken from a population indexed by θ Θ R k. Instead of estimating the unknown parameter, we

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE Communications in Statistics-Theory and Methods 33 (4) 1715-1731 NEW APPROXIMATE INFERENTIAL METODS FOR TE RELIABILITY PARAMETER IN A STRESS-STRENGT MODEL: TE NORMAL CASE uizhen Guo and K. Krishnamoorthy

More information

1 One-way Analysis of Variance

1 One-way Analysis of Variance 1 One-way Analysis of Variance Suppose that a random sample of q individuals receives treatment T i, i = 1,,... p. Let Y ij be the response from the jth individual to be treated with the ith treatment

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Week 14 Comparing k(> 2) Populations

Week 14 Comparing k(> 2) Populations Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.

More information

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County

Mean. Pranab K. Mitra and Bimal K. Sinha. Department of Mathematics and Statistics, University Of Maryland, Baltimore County A Generalized p-value Approach to Inference on Common Mean Pranab K. Mitra and Bimal K. Sinha Department of Mathematics and Statistics, University Of Maryland, Baltimore County 1000 Hilltop Circle, Baltimore,

More information

Generalized Linear Models and Extensions

Generalized Linear Models and Extensions Session 1 1 Generalized Linear Models and Extensions Clarice Garcia Borges Demétrio ESALQ/USP Piracicaba, SP, Brasil March 2013 email: Clarice.demetrio@usp.br Session 1 2 Course Outline Session 1 - Generalized

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

A Bivariate Weibull Regression Model

A Bivariate Weibull Regression Model c Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 20 (2005), No. 1, 1 A Bivariate Weibull Regression Model David D. Hanagal Abstract: In this paper, we propose a new bivariate Weibull regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares ST 430/514 Recall the linear regression equation E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + + β k x k We have estimated the parameters β 0, β 1, β 2,..., β k by minimizing the sum of squared

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

A test for improved forecasting performance at higher lead times

A test for improved forecasting performance at higher lead times A test for improved forecasting performance at higher lead times John Haywood and Granville Tunnicliffe Wilson September 3 Abstract Tiao and Xu (1993) proposed a test of whether a time series model, estimated

More information

Generalized linear models

Generalized linear models Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction

More information

DELTA METHOD and RESERVING

DELTA METHOD and RESERVING XXXVI th ASTIN COLLOQUIUM Zurich, 4 6 September 2005 DELTA METHOD and RESERVING C.PARTRAT, Lyon 1 university (ISFA) N.PEY, AXA Canada J.SCHILLING, GIE AXA Introduction Presentation of methods based on

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION

ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION ON COMBINING CORRELATED ESTIMATORS OF THE COMMON MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION K. KRISHNAMOORTHY 1 and YONG LU Department of Mathematics, University of Louisiana at Lafayette Lafayette, LA

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Lecture 16 Solving GLMs via IRWLS

Lecture 16 Solving GLMs via IRWLS Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example

More information

Chapter 12. Analysis of variance

Chapter 12. Analysis of variance Serik Sagitov, Chalmers and GU, January 9, 016 Chapter 1. Analysis of variance Chapter 11: I = samples independent samples paired samples Chapter 1: I 3 samples of equal size J one-way layout two-way layout

More information