((n r) 1) (r 1) ε 1 ε 2. X Z β+

Size: px
Start display at page:

Download "((n r) 1) (r 1) ε 1 ε 2. X Z β+"

Transcription

1 Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der/home.html 0.1 INTRODUCTION Consider the model {Y i = β 0 + β 1 X i1 + + β k X ik + ε i (1 i n)} relating a response Y i to fixed regressors {X i1,x i2,...,x ik } through p = k + 1 unknown parameters β =[β 0, β 1,...,β k ] 0. THE FULL MODEL Y = X 0 β + ε is partitioned based on the proposed outliers {Y i : i I} of cardinality r. Y 1 Y 2 = X Z β+ ε 1 ε 2 ((n r) 1) (r 1) with least-squares estimator ˆβ (p 1). All matrices are assume to be of full rank. THE REDUCED MODEL Y 1 = Xβ + ε 1 with the proposed outliers deleted Y 1 = Xβ + ε 1 with least-squares estimator ˆβ I. 1

2 0.2 STANDARD OUTLIER DIAGNOSTICS GROUP I DIAGNOSTICS (t-like statistics) STUDENTIZED DELETED RESIDUAL TEST (Gold Standard) y i ŷ (i)i s (i) r1+x i (X 0 X) 1 x 0 i = y i ŷ (i)i s (i) r 1+hii /(1 h ii ) t(n p 1) (1) THE EXTERNALLY STUDENTIZED RESIDUALS (R STUDENT statistic) Easy to compute t i = y i ŷ i s (i) r1 x i (X 0 0 X 0) 1 x 0 i = y i ŷ i s (i) 1 hii t(n p 1) (2) DFFITS y DFFITS i = b i ŷ (i)i s (i) hii DFFITS i v u t h ii 1 h ii = = t i v u t h ii 1 h ii with h ii = with h ii = with h ii =0.95 2

3 DFBETAS DFBETAS (i)j = DFBETAS (i)j = t i USEFUL RELATIONSHIPS c β j β c (i)j s s (i) (X 0 0X 0 ) 1 jj [(X 0 0X 0 ) 1 X 0 ] ji s (X 0 0X 0 ) 1 jj (1 h ii) c β β c (i) = (X0 0X 0 ) 1 x i (y i y b i ) 1 h ii (y i y b i ) 2 = (n p)s 2 (n p 1)s 2 (i) 1 h ii (3) STANDARD RECOMMENDATION (Myers (1990)): As we indicated earlier, these two measures (DFFITS and DFBETAS) are t-like. Surely any analyst who is familiar with the concept of a standard error knows that if DFBETAS (i)j exceeds 2.0 in magnitude, the influence of the data point in unquestioned. Note: Y Y d or β c β c (i) are OUTLIER measures while h ii and 1/(1 h ii )aremeasuresofinfluence. CONCLUSION: All Group I diagnostics yield the same p-values: Jensen and Ramirez (1996) # 1 and 2, Jensen (1998, 2000) # 2, 3 and 3 and LaMotte (1999) # 1, 2, 3 and 3 3

4 0.2.2 GROUP II DIAGNOSTICS (heuristics based on s 2 I/s 2 ) The ratio s 2 I/s 2 is the Neyman-Pearson likelihood ratio test statistics. R-FISHER parallels the R-Student statistic (Jensen (1999, 2001) (Gold Standard) F I where = e0 I(I r Z(X 0 0 X 0) 1 Z 0 ) 1 e I rs 2 I = ((n p)s2 (n p r)s 2 I)/r s 2 I X(X 0 X)X 0 = H = H 00 H I0 = e0 I(I r H II ) 1 e I rs 2 (4) I F r,n p r H 0I H II MEAN SHIFT OUTLIER MODEL is based on Y 1 = X β+ 0 δ+ ε 1 Y 2 Z I r ε 2 Outliers can be checked jointly with model building. Outliers are tested based on the null hypothesis H 0 : δ = 0 with F I = (RSS(H 0) RSS(H a ))/r (5) RSS(H a )/(n p r) = ((n p)s2 (n p r)s 2 I)/r s 2 F r,n p r I With F (i) = t 2 i 4

5 OUT I (r 1) Barnett and Lewis (1984) OUT I = 1 s2 I (6) s 2 1 n p r OUT r(f I 1) I = rf I +(n p r) 1 COVRATIO (r 1) CR (i) = s2 (i)(x 0 X) 1 s 2 (X 0 0X 0 ) 1 = 1 1 h ii 1 1 h ii = Ã n p t 2 i +(n p 1)! p s 2 (i) s 2 p µ n p p n p 1 0 CR i = 1 h ii 1 h ii with h ii = with h ii = with h ii =0.95 (7) (8) AP i of Andrews and Pregibon (1978) (r 1) AP i = 1 n p 1 s 2 (i) (X 0 X) 1 n p s 2 (X 0 0X) 1 (9) h ii AP i = t2 i +(n p 1)h ii t 2 i +(n p 1) 1 5

6 FVARATIO of Belsley, Kuh and Welsch (1980) (r 1) 1 FV i = s2 (i) (10) 1 h ii s 2 0 FV i = (n p)/(t2 i + n p 1) (n p)/(n p 1) 1 h ii 1 h ii with h 1 ii =0.25 = with h ii = h ii with h ii =0.95 USEFUL RELATIONSHIPS c β β c I =(X 0 0X 0 ) 1 Z 0 (I H II ) 1 (Y I Y c I ) (Y I Y c I ) 0 (I H II ) 1 (Y I Y c I )=(n p)s 2 (n p 1)s 2 I CONCLUSION: All Group II statistics yield the same p-values: Jensen (1999, 2001) # 4, 6, 7, 9, 10, Jensen and Ramirez (1998) # 5 and LaMotte (1999) # 7 6

7 0.2.3 GROUP III DIAGNOSTICS (Distance measures) COOK S D I STATISTICS (1977) take the form D I (ˆβ, M,cˆσ 2 )= (ˆβ I ˆβ) 0 M(ˆβ I ˆβ) cˆσ 2 where M(p p) isnon-negativedefinite and ˆσ 2 is some estimator for the variance σ 2 and c is a user-defined constant. Standard Choices C i of Cook (1977): ˆσ 2 = s 2,c = p, and M = X 0 0X 0 = σ 2 cov(ˆβ) 1 C i = (ˆβ (i) ˆβ)X 0 0 X 0(ˆβ (i) ˆβ) = Y c (i) Y c 2 (11) ps 2 ps 2 (n p)h ii t 2 i C i = p(1 h ii ) t 2 i +(n p 1) with h h ii ii =0.25 = with h ii = h ii with h ii =0.95 WK i of Welsch and Kuh (1977): ˆσ 2 = s 2 I,c= p and M = X 0 0X 0 = σ 2 cov(ˆβ) 1 WK i WK i = h ii 1 h ii = = (ˆβ (i) ˆβ)X 0 0 X 0(ˆβ (i) ˆβ) h ii ps 2 (i) t 2 i p(1 h ii ) with h ii = with h ii = with h ii =0.95 = c Y (i) c Y 2 ps 2 (i) (12) 7

8 Our recommendation D I ( ˆβ, X 0 0X 0,rs 2 I) = (ˆβ I ˆβ) 0 X 0 0X 0 (ˆβ I ˆβ) rs 2 I (13) F r (t; γ 2 1,, γ 2 r; n p r) (14) {γ 2 1 γ 2 r > 0} are the ordered eigenvalues of Z(X 0 X) 1 Z 0 STANDARD RECOMMENDATION: Cook s D... is an F -like statistic with degrees of freedom p and n p. The critical value has been suggested by Cook to be F p,n p (0.50). W i of Welsch (1982): ˆσ 2 = s 2 I,c = p, and M = X 0 X = σ 2 cov(ˆβ I ) 1 W i W i = h ii = = (ˆβ (i) ˆβ)X 0 X(ˆβ (i) ˆβ) h ii t 2 i ps 2 (i) p 0.25 with h ii = with h ii = with h ii =0.95 (15) Our recommendation ˆσ 2 = s 2 I,c= r and M = X 0 X = σ 2 cov(ˆβ I ) 1 D I ( ˆβ, X 0 X,rs 2 I) = (ˆβ I ˆβ) 0 X 0 X(ˆβ I ˆβ) rs 2 I F r (t; λ 1,, λ r ; n p r) {λ 1 λ r > 0} are the ordered eigenvalues of Z(X 0 0 X 0) 1 Z 0, the canonical subset leverages. 8

9 D I of Jensen and Ramirez (1998) (Gold Standard) Normalized Cook Statistic ˆσ 2 = s 2 I,c= r, and M =((X 0 X) 1 (X 0 0 X 0) 1 ) + = σ 2 cov(ˆβ I ˆβ) + D I D I D i = (ˆβ I ˆβ)((X 0 X) 1 (X 0 0 X 0) 1 ) + (ˆβ I ˆβ) rs 2 I = F I F r,n p r = t 2 i CONCLUSION:: All Group III statistics yield the same p-values when r = 1 : Jensen (1999, 2001) # 11, 12, and 15, Jensen and Ramirez (1996) # 11 and LaMotte (1999) # 11. For r>1, the p-values are different. Their distribution is now known. Notation: {γ 2 1 γ 2 r} are the ordered eigenvalues of Z(X 0 X) 1 Z 0 {λ 1 λ r } are the ordered eigenvalues of Z(X 0 0 X 0) 1 Z 0 0 < λ i = γi 2 /(1 + γi 2 )=h ii < 1, 1 i r with h γi 2 ii =0.25 = with h ii = with h ii =0.95 9

10 SINGLE OUTLIER THEOREM (Jensen and Ramirez 1996): When r = 1, the following tests are all equivalent. y i ŷ (i) s (i) r1+x i (X 0 X) 1 x 0 i t(n p 1) y i ŷ i s (i) r1 x i (X 0 0 X t(n p 1) 0) 1 x 0 i F i F (1,n p 1) D i (ˆβ, X 0 0X 0,s 2 (i)) γ1f 2 (1,n p 1) D i (ˆβ, X 0 X,s 2 (i)) λ 1 F (1,n p 1) D i (ˆβ, σ 2 cov(ˆβ i ˆβ) +,s 2 (i)) F (1,n p 1) 10

11 GENERALIZED F DISTRIBUTION (Jensen and Ramirez, 1998) Suppose that elements of U =[U 1,,U r ] 0 are independent {N 1 (ω i, 1); 1 i r} random variables; let {α 1,, α r } be non-increasing positive weights; and identify T = α 1 U α r U 2 r If L(V )=χ 2 (ν) independently of U, then the cdf of is denoted by W = T/r V/ν F r (t; α 1,, α r ; ω; ν) THEOREM (Ramirez and Jensen (1991)) For ω = 0, thepdfof the central generalized F distribution (T/r)/(V/ν) = W has the representation in terms of central F distributions as X c i r i=0 δ r +2i f r w F( ; r +2i, ν) (16) r +2i δ with f F as the pdf of F (ν 1, ν 2 ), and whose coefficients {c i } are defined recursively with 0 < δ < α r. Ramirez (2000) gave the Fortran code for computing p-values for F r (t; α 1,, α r ; ν). Dunkl and Ramirez (2001) have developed a fast algorithm for computing the p-values for W without having to integrate the density function in terms of Lauricella F (r) D functions. STOCHASTIC BOUNDS (Jensen and Ramirez (1991)) for F r are F r (t; α 1 ; ν) F r (t; α 1,, α r ; ν) F r (t; α ; ν) with α thegeometricmeanof{α 1,, α r }. 11

12 THEOREM (Jensen and Ramirez (200x)) For ω 6= 0, thepdfof the non-central generalized F distribution (T/r)/(V/ν) = W has the representation in terms of central F distributions as X c i r i=0 δ r +2i f r w F( ; r +2i, ν) r +2i δ with f F as the pdf of F (ν 1, ν 2 ), and whose coefficients {c i } are defined recursively with 0 < δ < α r and where {c i } are functions of ω. NORMAL THEORY RESULTS (Jensen and Ramirez (1998)): If L(Y) =N N (X 0 β, σ 2 I n ), then with ν = n p r D I (ˆβ, X 0 0X 0,rs 2 I) F r (t; γ 2 1,, γ 2 r; n p r) where {γ 2 1 γ 2 r} are the ordered eigenvalues of Z(X 0 X) 1 Z 0 ; D I (ˆβ, X 0 X,rs 2 I) F r (t; λ 1,, λ r ; n p r) where {λ 1 λ r } are the ordered eigenvalues of Z(X 0 0 X 0) 1 Z 0 and λ i = γ 2 i /(1 + γ 2 i )=h ii, 1 i r; and D I (ˆβ, σ 2 cov(ˆβ I ˆβ) +,rs 2 I) F r (t;1,, 1; n k) 12

13 JOINT OUTLIER RESULTS: When r > 1, outliers can be tested with F I F r (t;1,, 1; n p r) D I (ˆβ, σ 2 cov(ˆβ I ˆβ) +,rs 2 I) = D I F r (t;1,, 1; n p r) D I (ˆβ, X 0 0X 0,rs 2 I) F r (t; γ 2 1,, γ 2 r; n p r) (17) D I (ˆβ, X 0 X,rs 2 I) F r (t; λ 1,, λ r ; n p r) (18) The mean shift statistic F I and the normalized Cook statistic D I yield the same p-values. For the Cook-like statistics, we prefer (18) over (17) for computationally reasons, since the number of terms required in the series expansion for the cdf of (T/r)/(V/ν) is smaller. This number increases with the condition numbers, which satisfy γ1/γ 2 r 2 λ 1 /λ r. 13

14 0.3 EXAMPLE We use the Drill Data Set from Cook and Weisberg (1982, p. 149) with n =31andk = 10. With r =1,wefind the influential rows (with p<.025) to be rows 9, 28, and row 31. The p-values are (necessarily) the same using D i (ˆβ, X 0 0X 0,s 2 i ), D i = D i (ˆβ, X 0 X,s 2 (i)), D i (ˆβ, σ 2 cov(ˆβ I ˆβ) +,s 2 (i)), or t i (RSTUDENT). Row s 2 (i) D i λ i p-value D i /λ i With r = 2, we find 39 influential pairs (with p <.01 and using the average of the p-bounds) using D I (ˆβ, X 0 0X 0, 2s 2 I), 35 pairs using D I (ˆβ, X 0 X, 2s 2 I), and 16 pairs (directly) using D I (ˆβ, σ 2 cov(ˆβ I ˆβ) +, 2s 2 I). All contain rows from {9,28,31} except for the pair (5,26). For this pair using D I (ˆβ, X 0 X, 2s 2 I), we have Rows s 2 I D I λ 1 λ 2 LB p UB 5, The p-values for this pair, using D I ( ˆβ,X 0 0X 0, 2s 2 I), D I ( ˆβ,X 0 X, 14

15 2s 2 I), and D I ( ˆβ, σ 2 cov( ˆβ I ˆβ) +, 2s 2 I): Test p-value N D I ( ˆβ,X 0X 0 0, 2s 2 I) D I ( ˆβ,X 0 X, 2s 2 I) D I ( ˆβ, σ 2 cov( ˆβ I ˆβ) +, 2s 2 I) Other Heuristics We now consider joint outliers for the Intercountry Life-Cycle Savings Data from Belsley et al. (1980, p. 41). We will only consider pairs of observations with I = {i 1,i 2 }. The extension to larger subsets is immediate. Belsley et al. (1980) used a number of different empirical procedures to identify 15 pairs of observations that may be influential. These procedures include MDFFIT (MD), COVRA- TIO (CR), RESRATIO (RR), MEWDFFIT (ME), Wilks Λ statistic (WL), and the Andrews and Pregibon statistic (AP). Table I shows those pairs of observations are of note based on these criteria. The last column records the p-values from D I ( ˆβ,X 0 0X 0, 2s 2 I). These empirical procedures do not agree with each other, nor do they detect the joints outliers which we are able to determine using D I as an omnibus test of model fit under the mean shift and variance shift outlier model. Outliers are often influential, however not all influential subsets are outliers (see Barnett and Lewis (1994, p. 317)). 15

16 Table I: Multiple-Row Influence Using Empirical Procedures Rows MD CR RR ME WL AP p-value 34,46 * * ,46 * * ,46 * * * * ,23 * ,49 * * *.081 7,46 * * ,49 * ,49 * , ,49 *.329 6, ,49 * ,49 * * ,49 * *.438 6,44 *.699 MDFFIT (MD) (numerator of D I (ˆβ, X 0 X, 2s 2 I)) (ˆβ I ˆβ) 0 X 0 X(ˆβ I ˆβ) COVRATIO (CR) CR (i) = s2 (i)(x 0 X) 1 s 2 (X 0 0X 0 ) 1 RESRATIO (RR) (same as the mean-shift statistic) F I = ((n p)s2 (n p r)s 2 I)/r s 2 I 16

17 MEWDFFIT (ME) ME = X Wilks Λ statistic (WL) WL =1 h ij i,j I Andrews and Pregibon (AP) AP I =1 n p r n p e i e j (1 h i )(1 h j ) n r(n r) 10 Z(Z 0 Z) 1 Z 0 1 s 2 I (X 0 X) 1 s 2 (X 0 0X) 1 17

18 0.4 R-Fisher distribution Assumptions A. A1. E(ε) =0 < n r and E(ε I )=δ < r ; A2. V (ε 0 )=Diag(σ 2, σ 2 1); and A3. L(ε, ε I δ) =N n (0,Diag(σ 2, σ 2 1)). This model allows for shifts in both location and scale at design points in Z. Y 1 Y 2 = X Z β+ 0 I r δ+ ε ε I δ With δ =0andκ = σ 2 1/σ 2 = 1 (Jensen (1999, 2001)) F I = e0 I(I r Z(X 0 0 X 0) 1 Z 0 ) 1 e I rs 2 I = ((n p)s2 (n p r)s 2 I)/r s 2 I = e0 I(I r H II ) 1 e I rs 2 I F r,n p r 18

19 For the general model with δ 6= 0andκ = σ 2 1/σ 2 > 1, we have THEOREM (Jensen (1999) for κ =1andJensenandRamirez (200x) for κ 6= 1)ConsidertheR-Fisher diagnostic F I under Assumptions A; let{λ 1 λ 2... λ r > 0} comprise the canonical subset leverages (the eigenvalues of H II ); set κ = σ 2 1/σ 2 1; let θ = Q 0 δ, such that Q 0 H II Q = Diag(λ 1, λ 2,, λ r ); and identify ν = n p r. i) The cdf of F I is given by F s (w; α 0 ; ω 0 ; ν), with weights {α i = κ (κ 1)λ i ;1 i r} satisfying {α r... α 1 1}, and with location parameters {ω i = θ i /σ[κ + λ i /(1 λ i )] 1/2 ;1 i r}. ii) Bounds for the cdf s in terms of Fisher s distribution are given by F (w/α r ; r, ν, λ) F s (w; α 0 ; ω 0 ; ν) F (w/α ; r, ν, λ) (19) where λ = P r i=1 θ 2 i /σ 2 [κ + λ i /(1 λ i )] and α =(α 1 α r ) 1/r is the geometric mean. 19

20 0.4.1 Case Study Data (Myers (1990)) regarding the administration of Bachelor Officers Quarters (BOQ) were reported for sites at n =25naval installations. Monthly man-hours (Y ) were related linearly to average daily occupancy (X 1 ), monthly number of check-ins (X 2 ), weekly service desk operation in hours (X 3 ), size of common use area (X 4 ), number of building wings (X 5 ), operational berthing capacity (X 6 ), and number of rooms (X 7 ). The data are reported in Myers (1990), p. 218 ff, together with detailed analyses using single-case deletion diagnostics. Subset deletion diagnostics are not reported there. We focus on sites {15, 20, 21, 23, 24}, having individual leverages {0.5576, , , , }, their individual R-Student values exceeding the widely used ±2 rule. Pairs of sites selected are S 1 = {20, 21}, S 2 = {15, 20}, and S 3 = {23, 24}, reflecting smaller, intermediate, and larger individual leverages. 20

21 1 σ Table II of Low, Medium, and High Leverages S 1 S 2 S 3 κ κ κ (6).3606(6) (6).2779(6) (4).0734(5) (10).5848(9) (9).4743(9) (5).0904(6) (15).8105(13) (13).7080(12) (6).1195(7) (21).9426(18) (18).8826(16) (7).1613(8)

22 0.4.2 Conclusions The single-case deletion diagnostics of Groups I, II, and III are all equivalent. Moreover, the Group II subset deletion diagnostics and the Group III diagnostic D I are all equivalent to the R-Fisher diagnostic F I. These subset deletion diagnostics have been studied under outliers arising from shifts in both location and scale. In these circumstances the distribution of F I has been shown to be a noncentral generalized F distribution. We have given series expansions for these distributions, as well as global error bounds for their partial sums. Bounds for the cdfs of noncentral generalized F distributions also have been given in terms of the noncentral Fisher distributions. The noncentral generalized F distributions have been used to computethepowerforthediagnosticf I, and thus for all equivalent diagnostics, under shifts in location and scale at selected subsets of the BOQ data. These case studies demonstrate that subsets with large canonical leverages tend to associate with tests having low power, so that shifts in both location and scale may be masked at points of high leverages. 22

23 References [1] Andrews, D. F. and Pregibon, D. (1978), Finding outliers that matter, J.RoyalStatist.Soc.B40, [2] Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. John Wiley and Sons, New York. [3] Belsley,D.A.,Kuh,E.andWelsch,R.E.(1980).Regression, Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley and Sons, New York. [4] Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall. [5] Dunkl, C. F. and Ramirez, D. E. (2001), Computation of the generalized F distribution, Australian and New Zealand Journal of Statistics, 43, [6] Jensen, D. R. and Ramirez, D. E. (1996), Computing the cdf of Cook s D I statistic,in Proceedings of the 12th Symposium in Computational Statistics, Barcelona, Spain, 1996, Prat, A. and Ripoll. [7] LaMotte, L. R. (1999), Collapsibility hypotheses and diagnostic bounds in regression analysis, Metrika 50, [8] Jensen, D. R. (1998), The use of standardized diagnostics in regression, Research Report R079, Department of Statistics, Virginia Polytechnic Institute and State University, October [9] Jensen, D. R. (1999), Properties of selected subset diagnostics in regression, Research Report R080, Department of Statistics, Virginia Polytechnic Institute and State University, March [10] Jensen, D. R. (2000), The use of Standardized diagnostics in regression, Metrika, 52, [11] Jensen, D. R. (2001), Properties of selected subset diagnostics in regression, Statist. Probab. Letters, 51,

24 [12] Jensen, D. R. and Ramirez, D. E. (1991), Misspecified Tˆ2 tests, I. Location and scale, Comm. Stat. Theory Meth. A 20, No. 1, [13] Jensen, D. R. and Ramirez, D. E. (1998), Some exact properties of Cook s D I, Handbook of Statistics, Vol. 16, Balakrishnan, N. and Rao, C. R., eds., pp Elsevier Science Publishers B. V. [14] Jensen, D. R. and Ramirez, D. E. (200x), Detecting mean-shift outliers via distances, Journal of Statistical Computation and Simulation. [15] Jensen, D. R. and Ramirez, D. E. (200x), Detecting shifts in location and scale in regression, South African Statistical Journal. [16] Myers, R. H. (1990). Classical and Modern Regression with Applications, Second ed. PWS-Kent Publishing Company, Boston, MA. [17] Ramirez, D E. (2000), The generalized F distribution, Journal of Statistical Software 5(1), [18] Ramirez, D. E. and Jensen, D. R. (1991). Misspecified T 2 tests. II. Series expansions. Commun. Statist. Simula. 20:

with M a(kk) nonnegative denite matrix, ^ 2 is an unbiased estimate of the variance, and c a user dened constant. We use the estimator s 2 I, the samp

with M a(kk) nonnegative denite matrix, ^ 2 is an unbiased estimate of the variance, and c a user dened constant. We use the estimator s 2 I, the samp The Generalized F Distribution Donald E. Ramirez Department of Mathematics University of Virginia Charlottesville, VA 22903-3199 USA der@virginia.edu http://www.math.virginia.edu/~der/home.html Key Words:

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Lecture One: A Quick Review/Overview on Regular Linear Regression Models

Lecture One: A Quick Review/Overview on Regular Linear Regression Models Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics

More information

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Kutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science

Kutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science Contributions to Linear Regression Diagnostics using the Singular Value Decomposition: Measures to Identify Outlying Observations, Influential Observations and Collinearity in Multivariate Data Kutlwano

More information

SENSITIVITY ANALYSIS IN LINEAR REGRESSION

SENSITIVITY ANALYSIS IN LINEAR REGRESSION SENSITIVITY ANALYSIS IN LINEAR REGRESSION José A. Díaz-García, Graciela González-Farías and Victor M. Alvarado-Castro Comunicación Técnica No I-05-05/11-04-2005 PE/CIMAT Sensitivity analysis in linear

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Validating Hotelling s T 2 Diagnostics in Mixtures

Validating Hotelling s T 2 Diagnostics in Mixtures Validating Hotelling s T 2 Diagnostics in Mixtures Revision of 06 01 2016 D. R. Jensen, Department of Statistics, Virginia Tech, Blacksburg, VA 24061. Email: djensen@vt.edu D. E. Ramirez, University of

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and

More information

J.D. Godolphin. Department of Mathematics, University of Surrey, Guildford, Surrey. GU2 7XH, U.K. Abstract

J.D. Godolphin. Department of Mathematics, University of Surrey, Guildford, Surrey. GU2 7XH, U.K. Abstract New formulations for recursive residuals as a diagnostic tool in the classical fixed-effects linear model of arbitrary rank JD Godolphin Department of Mathematics, University of Surrey, Guildford, Surrey

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Measuring Local Influential Observations in Modified Ridge Regression

Measuring Local Influential Observations in Modified Ridge Regression Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model

A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Journal of Data Science 3(2005), 137-151 A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Joseph E. Cavanaugh 1 and Junfeng Shang 2 1 University of

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

REGRESSION ANALYSIS AND INDICATOR VARIABLES

REGRESSION ANALYSIS AND INDICATOR VARIABLES REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

Outliers Detection in Multiple Circular Regression Model via DFBETAc Statistic

Outliers Detection in Multiple Circular Regression Model via DFBETAc Statistic International Journal of Applied Engineering Research ISSN 973-456 Volume 3, Number (8) pp. 983-99 Outliers Detection in Multiple Circular Regression Model via Statistic Najla Ahmed Alkasadi Student, Institute

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012 Econometric Methods Prediction / Violation of A-Assumptions Burcu Erdogan Universität Trier WS 2011/2012 (Universität Trier) Econometric Methods 30.11.2011 1 / 42 Moving on to... 1 Prediction 2 Violation

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

TMA4267 Linear Statistical Models V2017 (L10)

TMA4267 Linear Statistical Models V2017 (L10) TMA4267 Linear Statistical Models V2017 (L10) Part 2: Linear regression: Parameter estimation [F:3.2], Properties of residuals and distribution of estimator for error variance Confidence interval and hypothesis

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58 ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

A. H. Abuzaid a, A. G. Hussin b & I. B. Mohamed a a Institute of Mathematical Sciences, University of Malaya, 50603,

A. H. Abuzaid a, A. G. Hussin b & I. B. Mohamed a a Institute of Mathematical Sciences, University of Malaya, 50603, This article was downloaded by: [University of Malaya] On: 09 June 2013, At: 19:32 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Pattern correlation matrices and their properties

Pattern correlation matrices and their properties Linear Algebra and its Applications 327 (2001) 105 114 www.elsevier.com/locate/laa Pattern correlation matrices and their properties Andrew L. Rukhin Department of Mathematics and Statistics, University

More information

Package svydiags. June 4, 2015

Package svydiags. June 4, 2015 Type Package Package svydiags June 4, 2015 Title Linear Regression Model Diagnostics for Survey Data Version 0.1 Date 2015-01-21 Author Richard Valliant Maintainer Richard Valliant Description

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

Applied linear statistical models: An overview

Applied linear statistical models: An overview Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Response surface designs using the generalized variance inflation factors

Response surface designs using the generalized variance inflation factors STATISTICS RESEARCH ARTICLE Response surface designs using the generalized variance inflation factors Diarmuid O Driscoll and Donald E Ramirez 2 * Received: 22 December 204 Accepted: 5 May 205 Published:

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

The Effect of a Single Point on Correlation and Slope

The Effect of a Single Point on Correlation and Slope Rochester Institute of Technology RIT Scholar Works Articles 1990 The Effect of a Single Point on Correlation and Slope David L. Farnsworth Rochester Institute of Technology This work is licensed under

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T, Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship

More information

Beam Example: Identifying Influential Observations using the Hat Matrix

Beam Example: Identifying Influential Observations using the Hat Matrix Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the

More information

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/

More information

MATH Notebook 4 Spring 2018

MATH Notebook 4 Spring 2018 MATH448001 Notebook 4 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH448001 Notebook 4 3 4.1 Simple Linear Model.................................

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model International Journal of Statistics and Applications 04, 4(): 4-33 DOI: 0.593/j.statistics.04040.07 Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35 General Linear Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 35 Suppose the NTGMM holds so that y = Xβ + ε, where ε N(0, σ 2 I). opyright

More information

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved. LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information