((n r) 1) (r 1) ε 1 ε 2. X Z β+
|
|
- Edmund Ray
- 5 years ago
- Views:
Transcription
1 Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der/home.html 0.1 INTRODUCTION Consider the model {Y i = β 0 + β 1 X i1 + + β k X ik + ε i (1 i n)} relating a response Y i to fixed regressors {X i1,x i2,...,x ik } through p = k + 1 unknown parameters β =[β 0, β 1,...,β k ] 0. THE FULL MODEL Y = X 0 β + ε is partitioned based on the proposed outliers {Y i : i I} of cardinality r. Y 1 Y 2 = X Z β+ ε 1 ε 2 ((n r) 1) (r 1) with least-squares estimator ˆβ (p 1). All matrices are assume to be of full rank. THE REDUCED MODEL Y 1 = Xβ + ε 1 with the proposed outliers deleted Y 1 = Xβ + ε 1 with least-squares estimator ˆβ I. 1
2 0.2 STANDARD OUTLIER DIAGNOSTICS GROUP I DIAGNOSTICS (t-like statistics) STUDENTIZED DELETED RESIDUAL TEST (Gold Standard) y i ŷ (i)i s (i) r1+x i (X 0 X) 1 x 0 i = y i ŷ (i)i s (i) r 1+hii /(1 h ii ) t(n p 1) (1) THE EXTERNALLY STUDENTIZED RESIDUALS (R STUDENT statistic) Easy to compute t i = y i ŷ i s (i) r1 x i (X 0 0 X 0) 1 x 0 i = y i ŷ i s (i) 1 hii t(n p 1) (2) DFFITS y DFFITS i = b i ŷ (i)i s (i) hii DFFITS i v u t h ii 1 h ii = = t i v u t h ii 1 h ii with h ii = with h ii = with h ii =0.95 2
3 DFBETAS DFBETAS (i)j = DFBETAS (i)j = t i USEFUL RELATIONSHIPS c β j β c (i)j s s (i) (X 0 0X 0 ) 1 jj [(X 0 0X 0 ) 1 X 0 ] ji s (X 0 0X 0 ) 1 jj (1 h ii) c β β c (i) = (X0 0X 0 ) 1 x i (y i y b i ) 1 h ii (y i y b i ) 2 = (n p)s 2 (n p 1)s 2 (i) 1 h ii (3) STANDARD RECOMMENDATION (Myers (1990)): As we indicated earlier, these two measures (DFFITS and DFBETAS) are t-like. Surely any analyst who is familiar with the concept of a standard error knows that if DFBETAS (i)j exceeds 2.0 in magnitude, the influence of the data point in unquestioned. Note: Y Y d or β c β c (i) are OUTLIER measures while h ii and 1/(1 h ii )aremeasuresofinfluence. CONCLUSION: All Group I diagnostics yield the same p-values: Jensen and Ramirez (1996) # 1 and 2, Jensen (1998, 2000) # 2, 3 and 3 and LaMotte (1999) # 1, 2, 3 and 3 3
4 0.2.2 GROUP II DIAGNOSTICS (heuristics based on s 2 I/s 2 ) The ratio s 2 I/s 2 is the Neyman-Pearson likelihood ratio test statistics. R-FISHER parallels the R-Student statistic (Jensen (1999, 2001) (Gold Standard) F I where = e0 I(I r Z(X 0 0 X 0) 1 Z 0 ) 1 e I rs 2 I = ((n p)s2 (n p r)s 2 I)/r s 2 I X(X 0 X)X 0 = H = H 00 H I0 = e0 I(I r H II ) 1 e I rs 2 (4) I F r,n p r H 0I H II MEAN SHIFT OUTLIER MODEL is based on Y 1 = X β+ 0 δ+ ε 1 Y 2 Z I r ε 2 Outliers can be checked jointly with model building. Outliers are tested based on the null hypothesis H 0 : δ = 0 with F I = (RSS(H 0) RSS(H a ))/r (5) RSS(H a )/(n p r) = ((n p)s2 (n p r)s 2 I)/r s 2 F r,n p r I With F (i) = t 2 i 4
5 OUT I (r 1) Barnett and Lewis (1984) OUT I = 1 s2 I (6) s 2 1 n p r OUT r(f I 1) I = rf I +(n p r) 1 COVRATIO (r 1) CR (i) = s2 (i)(x 0 X) 1 s 2 (X 0 0X 0 ) 1 = 1 1 h ii 1 1 h ii = Ã n p t 2 i +(n p 1)! p s 2 (i) s 2 p µ n p p n p 1 0 CR i = 1 h ii 1 h ii with h ii = with h ii = with h ii =0.95 (7) (8) AP i of Andrews and Pregibon (1978) (r 1) AP i = 1 n p 1 s 2 (i) (X 0 X) 1 n p s 2 (X 0 0X) 1 (9) h ii AP i = t2 i +(n p 1)h ii t 2 i +(n p 1) 1 5
6 FVARATIO of Belsley, Kuh and Welsch (1980) (r 1) 1 FV i = s2 (i) (10) 1 h ii s 2 0 FV i = (n p)/(t2 i + n p 1) (n p)/(n p 1) 1 h ii 1 h ii with h 1 ii =0.25 = with h ii = h ii with h ii =0.95 USEFUL RELATIONSHIPS c β β c I =(X 0 0X 0 ) 1 Z 0 (I H II ) 1 (Y I Y c I ) (Y I Y c I ) 0 (I H II ) 1 (Y I Y c I )=(n p)s 2 (n p 1)s 2 I CONCLUSION: All Group II statistics yield the same p-values: Jensen (1999, 2001) # 4, 6, 7, 9, 10, Jensen and Ramirez (1998) # 5 and LaMotte (1999) # 7 6
7 0.2.3 GROUP III DIAGNOSTICS (Distance measures) COOK S D I STATISTICS (1977) take the form D I (ˆβ, M,cˆσ 2 )= (ˆβ I ˆβ) 0 M(ˆβ I ˆβ) cˆσ 2 where M(p p) isnon-negativedefinite and ˆσ 2 is some estimator for the variance σ 2 and c is a user-defined constant. Standard Choices C i of Cook (1977): ˆσ 2 = s 2,c = p, and M = X 0 0X 0 = σ 2 cov(ˆβ) 1 C i = (ˆβ (i) ˆβ)X 0 0 X 0(ˆβ (i) ˆβ) = Y c (i) Y c 2 (11) ps 2 ps 2 (n p)h ii t 2 i C i = p(1 h ii ) t 2 i +(n p 1) with h h ii ii =0.25 = with h ii = h ii with h ii =0.95 WK i of Welsch and Kuh (1977): ˆσ 2 = s 2 I,c= p and M = X 0 0X 0 = σ 2 cov(ˆβ) 1 WK i WK i = h ii 1 h ii = = (ˆβ (i) ˆβ)X 0 0 X 0(ˆβ (i) ˆβ) h ii ps 2 (i) t 2 i p(1 h ii ) with h ii = with h ii = with h ii =0.95 = c Y (i) c Y 2 ps 2 (i) (12) 7
8 Our recommendation D I ( ˆβ, X 0 0X 0,rs 2 I) = (ˆβ I ˆβ) 0 X 0 0X 0 (ˆβ I ˆβ) rs 2 I (13) F r (t; γ 2 1,, γ 2 r; n p r) (14) {γ 2 1 γ 2 r > 0} are the ordered eigenvalues of Z(X 0 X) 1 Z 0 STANDARD RECOMMENDATION: Cook s D... is an F -like statistic with degrees of freedom p and n p. The critical value has been suggested by Cook to be F p,n p (0.50). W i of Welsch (1982): ˆσ 2 = s 2 I,c = p, and M = X 0 X = σ 2 cov(ˆβ I ) 1 W i W i = h ii = = (ˆβ (i) ˆβ)X 0 X(ˆβ (i) ˆβ) h ii t 2 i ps 2 (i) p 0.25 with h ii = with h ii = with h ii =0.95 (15) Our recommendation ˆσ 2 = s 2 I,c= r and M = X 0 X = σ 2 cov(ˆβ I ) 1 D I ( ˆβ, X 0 X,rs 2 I) = (ˆβ I ˆβ) 0 X 0 X(ˆβ I ˆβ) rs 2 I F r (t; λ 1,, λ r ; n p r) {λ 1 λ r > 0} are the ordered eigenvalues of Z(X 0 0 X 0) 1 Z 0, the canonical subset leverages. 8
9 D I of Jensen and Ramirez (1998) (Gold Standard) Normalized Cook Statistic ˆσ 2 = s 2 I,c= r, and M =((X 0 X) 1 (X 0 0 X 0) 1 ) + = σ 2 cov(ˆβ I ˆβ) + D I D I D i = (ˆβ I ˆβ)((X 0 X) 1 (X 0 0 X 0) 1 ) + (ˆβ I ˆβ) rs 2 I = F I F r,n p r = t 2 i CONCLUSION:: All Group III statistics yield the same p-values when r = 1 : Jensen (1999, 2001) # 11, 12, and 15, Jensen and Ramirez (1996) # 11 and LaMotte (1999) # 11. For r>1, the p-values are different. Their distribution is now known. Notation: {γ 2 1 γ 2 r} are the ordered eigenvalues of Z(X 0 X) 1 Z 0 {λ 1 λ r } are the ordered eigenvalues of Z(X 0 0 X 0) 1 Z 0 0 < λ i = γi 2 /(1 + γi 2 )=h ii < 1, 1 i r with h γi 2 ii =0.25 = with h ii = with h ii =0.95 9
10 SINGLE OUTLIER THEOREM (Jensen and Ramirez 1996): When r = 1, the following tests are all equivalent. y i ŷ (i) s (i) r1+x i (X 0 X) 1 x 0 i t(n p 1) y i ŷ i s (i) r1 x i (X 0 0 X t(n p 1) 0) 1 x 0 i F i F (1,n p 1) D i (ˆβ, X 0 0X 0,s 2 (i)) γ1f 2 (1,n p 1) D i (ˆβ, X 0 X,s 2 (i)) λ 1 F (1,n p 1) D i (ˆβ, σ 2 cov(ˆβ i ˆβ) +,s 2 (i)) F (1,n p 1) 10
11 GENERALIZED F DISTRIBUTION (Jensen and Ramirez, 1998) Suppose that elements of U =[U 1,,U r ] 0 are independent {N 1 (ω i, 1); 1 i r} random variables; let {α 1,, α r } be non-increasing positive weights; and identify T = α 1 U α r U 2 r If L(V )=χ 2 (ν) independently of U, then the cdf of is denoted by W = T/r V/ν F r (t; α 1,, α r ; ω; ν) THEOREM (Ramirez and Jensen (1991)) For ω = 0, thepdfof the central generalized F distribution (T/r)/(V/ν) = W has the representation in terms of central F distributions as X c i r i=0 δ r +2i f r w F( ; r +2i, ν) (16) r +2i δ with f F as the pdf of F (ν 1, ν 2 ), and whose coefficients {c i } are defined recursively with 0 < δ < α r. Ramirez (2000) gave the Fortran code for computing p-values for F r (t; α 1,, α r ; ν). Dunkl and Ramirez (2001) have developed a fast algorithm for computing the p-values for W without having to integrate the density function in terms of Lauricella F (r) D functions. STOCHASTIC BOUNDS (Jensen and Ramirez (1991)) for F r are F r (t; α 1 ; ν) F r (t; α 1,, α r ; ν) F r (t; α ; ν) with α thegeometricmeanof{α 1,, α r }. 11
12 THEOREM (Jensen and Ramirez (200x)) For ω 6= 0, thepdfof the non-central generalized F distribution (T/r)/(V/ν) = W has the representation in terms of central F distributions as X c i r i=0 δ r +2i f r w F( ; r +2i, ν) r +2i δ with f F as the pdf of F (ν 1, ν 2 ), and whose coefficients {c i } are defined recursively with 0 < δ < α r and where {c i } are functions of ω. NORMAL THEORY RESULTS (Jensen and Ramirez (1998)): If L(Y) =N N (X 0 β, σ 2 I n ), then with ν = n p r D I (ˆβ, X 0 0X 0,rs 2 I) F r (t; γ 2 1,, γ 2 r; n p r) where {γ 2 1 γ 2 r} are the ordered eigenvalues of Z(X 0 X) 1 Z 0 ; D I (ˆβ, X 0 X,rs 2 I) F r (t; λ 1,, λ r ; n p r) where {λ 1 λ r } are the ordered eigenvalues of Z(X 0 0 X 0) 1 Z 0 and λ i = γ 2 i /(1 + γ 2 i )=h ii, 1 i r; and D I (ˆβ, σ 2 cov(ˆβ I ˆβ) +,rs 2 I) F r (t;1,, 1; n k) 12
13 JOINT OUTLIER RESULTS: When r > 1, outliers can be tested with F I F r (t;1,, 1; n p r) D I (ˆβ, σ 2 cov(ˆβ I ˆβ) +,rs 2 I) = D I F r (t;1,, 1; n p r) D I (ˆβ, X 0 0X 0,rs 2 I) F r (t; γ 2 1,, γ 2 r; n p r) (17) D I (ˆβ, X 0 X,rs 2 I) F r (t; λ 1,, λ r ; n p r) (18) The mean shift statistic F I and the normalized Cook statistic D I yield the same p-values. For the Cook-like statistics, we prefer (18) over (17) for computationally reasons, since the number of terms required in the series expansion for the cdf of (T/r)/(V/ν) is smaller. This number increases with the condition numbers, which satisfy γ1/γ 2 r 2 λ 1 /λ r. 13
14 0.3 EXAMPLE We use the Drill Data Set from Cook and Weisberg (1982, p. 149) with n =31andk = 10. With r =1,wefind the influential rows (with p<.025) to be rows 9, 28, and row 31. The p-values are (necessarily) the same using D i (ˆβ, X 0 0X 0,s 2 i ), D i = D i (ˆβ, X 0 X,s 2 (i)), D i (ˆβ, σ 2 cov(ˆβ I ˆβ) +,s 2 (i)), or t i (RSTUDENT). Row s 2 (i) D i λ i p-value D i /λ i With r = 2, we find 39 influential pairs (with p <.01 and using the average of the p-bounds) using D I (ˆβ, X 0 0X 0, 2s 2 I), 35 pairs using D I (ˆβ, X 0 X, 2s 2 I), and 16 pairs (directly) using D I (ˆβ, σ 2 cov(ˆβ I ˆβ) +, 2s 2 I). All contain rows from {9,28,31} except for the pair (5,26). For this pair using D I (ˆβ, X 0 X, 2s 2 I), we have Rows s 2 I D I λ 1 λ 2 LB p UB 5, The p-values for this pair, using D I ( ˆβ,X 0 0X 0, 2s 2 I), D I ( ˆβ,X 0 X, 14
15 2s 2 I), and D I ( ˆβ, σ 2 cov( ˆβ I ˆβ) +, 2s 2 I): Test p-value N D I ( ˆβ,X 0X 0 0, 2s 2 I) D I ( ˆβ,X 0 X, 2s 2 I) D I ( ˆβ, σ 2 cov( ˆβ I ˆβ) +, 2s 2 I) Other Heuristics We now consider joint outliers for the Intercountry Life-Cycle Savings Data from Belsley et al. (1980, p. 41). We will only consider pairs of observations with I = {i 1,i 2 }. The extension to larger subsets is immediate. Belsley et al. (1980) used a number of different empirical procedures to identify 15 pairs of observations that may be influential. These procedures include MDFFIT (MD), COVRA- TIO (CR), RESRATIO (RR), MEWDFFIT (ME), Wilks Λ statistic (WL), and the Andrews and Pregibon statistic (AP). Table I shows those pairs of observations are of note based on these criteria. The last column records the p-values from D I ( ˆβ,X 0 0X 0, 2s 2 I). These empirical procedures do not agree with each other, nor do they detect the joints outliers which we are able to determine using D I as an omnibus test of model fit under the mean shift and variance shift outlier model. Outliers are often influential, however not all influential subsets are outliers (see Barnett and Lewis (1994, p. 317)). 15
16 Table I: Multiple-Row Influence Using Empirical Procedures Rows MD CR RR ME WL AP p-value 34,46 * * ,46 * * ,46 * * * * ,23 * ,49 * * *.081 7,46 * * ,49 * ,49 * , ,49 *.329 6, ,49 * ,49 * * ,49 * *.438 6,44 *.699 MDFFIT (MD) (numerator of D I (ˆβ, X 0 X, 2s 2 I)) (ˆβ I ˆβ) 0 X 0 X(ˆβ I ˆβ) COVRATIO (CR) CR (i) = s2 (i)(x 0 X) 1 s 2 (X 0 0X 0 ) 1 RESRATIO (RR) (same as the mean-shift statistic) F I = ((n p)s2 (n p r)s 2 I)/r s 2 I 16
17 MEWDFFIT (ME) ME = X Wilks Λ statistic (WL) WL =1 h ij i,j I Andrews and Pregibon (AP) AP I =1 n p r n p e i e j (1 h i )(1 h j ) n r(n r) 10 Z(Z 0 Z) 1 Z 0 1 s 2 I (X 0 X) 1 s 2 (X 0 0X) 1 17
18 0.4 R-Fisher distribution Assumptions A. A1. E(ε) =0 < n r and E(ε I )=δ < r ; A2. V (ε 0 )=Diag(σ 2, σ 2 1); and A3. L(ε, ε I δ) =N n (0,Diag(σ 2, σ 2 1)). This model allows for shifts in both location and scale at design points in Z. Y 1 Y 2 = X Z β+ 0 I r δ+ ε ε I δ With δ =0andκ = σ 2 1/σ 2 = 1 (Jensen (1999, 2001)) F I = e0 I(I r Z(X 0 0 X 0) 1 Z 0 ) 1 e I rs 2 I = ((n p)s2 (n p r)s 2 I)/r s 2 I = e0 I(I r H II ) 1 e I rs 2 I F r,n p r 18
19 For the general model with δ 6= 0andκ = σ 2 1/σ 2 > 1, we have THEOREM (Jensen (1999) for κ =1andJensenandRamirez (200x) for κ 6= 1)ConsidertheR-Fisher diagnostic F I under Assumptions A; let{λ 1 λ 2... λ r > 0} comprise the canonical subset leverages (the eigenvalues of H II ); set κ = σ 2 1/σ 2 1; let θ = Q 0 δ, such that Q 0 H II Q = Diag(λ 1, λ 2,, λ r ); and identify ν = n p r. i) The cdf of F I is given by F s (w; α 0 ; ω 0 ; ν), with weights {α i = κ (κ 1)λ i ;1 i r} satisfying {α r... α 1 1}, and with location parameters {ω i = θ i /σ[κ + λ i /(1 λ i )] 1/2 ;1 i r}. ii) Bounds for the cdf s in terms of Fisher s distribution are given by F (w/α r ; r, ν, λ) F s (w; α 0 ; ω 0 ; ν) F (w/α ; r, ν, λ) (19) where λ = P r i=1 θ 2 i /σ 2 [κ + λ i /(1 λ i )] and α =(α 1 α r ) 1/r is the geometric mean. 19
20 0.4.1 Case Study Data (Myers (1990)) regarding the administration of Bachelor Officers Quarters (BOQ) were reported for sites at n =25naval installations. Monthly man-hours (Y ) were related linearly to average daily occupancy (X 1 ), monthly number of check-ins (X 2 ), weekly service desk operation in hours (X 3 ), size of common use area (X 4 ), number of building wings (X 5 ), operational berthing capacity (X 6 ), and number of rooms (X 7 ). The data are reported in Myers (1990), p. 218 ff, together with detailed analyses using single-case deletion diagnostics. Subset deletion diagnostics are not reported there. We focus on sites {15, 20, 21, 23, 24}, having individual leverages {0.5576, , , , }, their individual R-Student values exceeding the widely used ±2 rule. Pairs of sites selected are S 1 = {20, 21}, S 2 = {15, 20}, and S 3 = {23, 24}, reflecting smaller, intermediate, and larger individual leverages. 20
21 1 σ Table II of Low, Medium, and High Leverages S 1 S 2 S 3 κ κ κ (6).3606(6) (6).2779(6) (4).0734(5) (10).5848(9) (9).4743(9) (5).0904(6) (15).8105(13) (13).7080(12) (6).1195(7) (21).9426(18) (18).8826(16) (7).1613(8)
22 0.4.2 Conclusions The single-case deletion diagnostics of Groups I, II, and III are all equivalent. Moreover, the Group II subset deletion diagnostics and the Group III diagnostic D I are all equivalent to the R-Fisher diagnostic F I. These subset deletion diagnostics have been studied under outliers arising from shifts in both location and scale. In these circumstances the distribution of F I has been shown to be a noncentral generalized F distribution. We have given series expansions for these distributions, as well as global error bounds for their partial sums. Bounds for the cdfs of noncentral generalized F distributions also have been given in terms of the noncentral Fisher distributions. The noncentral generalized F distributions have been used to computethepowerforthediagnosticf I, and thus for all equivalent diagnostics, under shifts in location and scale at selected subsets of the BOQ data. These case studies demonstrate that subsets with large canonical leverages tend to associate with tests having low power, so that shifts in both location and scale may be masked at points of high leverages. 22
23 References [1] Andrews, D. F. and Pregibon, D. (1978), Finding outliers that matter, J.RoyalStatist.Soc.B40, [2] Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. John Wiley and Sons, New York. [3] Belsley,D.A.,Kuh,E.andWelsch,R.E.(1980).Regression, Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley and Sons, New York. [4] Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall. [5] Dunkl, C. F. and Ramirez, D. E. (2001), Computation of the generalized F distribution, Australian and New Zealand Journal of Statistics, 43, [6] Jensen, D. R. and Ramirez, D. E. (1996), Computing the cdf of Cook s D I statistic,in Proceedings of the 12th Symposium in Computational Statistics, Barcelona, Spain, 1996, Prat, A. and Ripoll. [7] LaMotte, L. R. (1999), Collapsibility hypotheses and diagnostic bounds in regression analysis, Metrika 50, [8] Jensen, D. R. (1998), The use of standardized diagnostics in regression, Research Report R079, Department of Statistics, Virginia Polytechnic Institute and State University, October [9] Jensen, D. R. (1999), Properties of selected subset diagnostics in regression, Research Report R080, Department of Statistics, Virginia Polytechnic Institute and State University, March [10] Jensen, D. R. (2000), The use of Standardized diagnostics in regression, Metrika, 52, [11] Jensen, D. R. (2001), Properties of selected subset diagnostics in regression, Statist. Probab. Letters, 51,
24 [12] Jensen, D. R. and Ramirez, D. E. (1991), Misspecified Tˆ2 tests, I. Location and scale, Comm. Stat. Theory Meth. A 20, No. 1, [13] Jensen, D. R. and Ramirez, D. E. (1998), Some exact properties of Cook s D I, Handbook of Statistics, Vol. 16, Balakrishnan, N. and Rao, C. R., eds., pp Elsevier Science Publishers B. V. [14] Jensen, D. R. and Ramirez, D. E. (200x), Detecting mean-shift outliers via distances, Journal of Statistical Computation and Simulation. [15] Jensen, D. R. and Ramirez, D. E. (200x), Detecting shifts in location and scale in regression, South African Statistical Journal. [16] Myers, R. H. (1990). Classical and Modern Regression with Applications, Second ed. PWS-Kent Publishing Company, Boston, MA. [17] Ramirez, D E. (2000), The generalized F distribution, Journal of Statistical Software 5(1), [18] Ramirez, D. E. and Jensen, D. R. (1991). Misspecified T 2 tests. II. Series expansions. Commun. Statist. Simula. 20:
with M a(kk) nonnegative denite matrix, ^ 2 is an unbiased estimate of the variance, and c a user dened constant. We use the estimator s 2 I, the samp
The Generalized F Distribution Donald E. Ramirez Department of Mathematics University of Virginia Charlottesville, VA 22903-3199 USA der@virginia.edu http://www.math.virginia.edu/~der/home.html Key Words:
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationRegression Diagnostics for Survey Data
Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationDetecting and Assessing Data Outliers and Leverage Points
Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationLecture One: A Quick Review/Overview on Regular Linear Regression Models
Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics
More informationA CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE
Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationRegression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr
Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationQuantitative Methods I: Regression diagnostics
Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationKutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science
Contributions to Linear Regression Diagnostics using the Singular Value Decomposition: Measures to Identify Outlying Observations, Influential Observations and Collinearity in Multivariate Data Kutlwano
More informationSENSITIVITY ANALYSIS IN LINEAR REGRESSION
SENSITIVITY ANALYSIS IN LINEAR REGRESSION José A. Díaz-García, Graciela González-Farías and Victor M. Alvarado-Castro Comunicación Técnica No I-05-05/11-04-2005 PE/CIMAT Sensitivity analysis in linear
More informationAnswers to Problem Set #4
Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2
More informationValidating Hotelling s T 2 Diagnostics in Mixtures
Validating Hotelling s T 2 Diagnostics in Mixtures Revision of 06 01 2016 D. R. Jensen, Department of Statistics, Virginia Tech, Blacksburg, VA 24061. Email: djensen@vt.edu D. E. Ramirez, University of
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationMATH5745 Multivariate Methods Lecture 07
MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction
More informationTreatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev
Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and
More informationJ.D. Godolphin. Department of Mathematics, University of Surrey, Guildford, Surrey. GU2 7XH, U.K. Abstract
New formulations for recursive residuals as a diagnostic tool in the classical fixed-effects linear model of arbitrary rank JD Godolphin Department of Mathematics, University of Surrey, Guildford, Surrey
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationMeasuring Local Influential Observations in Modified Ridge Regression
Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationA Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model
Journal of Data Science 3(2005), 137-151 A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Joseph E. Cavanaugh 1 and Junfeng Shang 2 1 University of
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationREGRESSION ANALYSIS AND INDICATOR VARIABLES
REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity
More informationThe Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models
Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling
More information1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.
1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive
More informationMultivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation
Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics
More informationOutliers Detection in Multiple Circular Regression Model via DFBETAc Statistic
International Journal of Applied Engineering Research ISSN 973-456 Volume 3, Number (8) pp. 983-99 Outliers Detection in Multiple Circular Regression Model via Statistic Najla Ahmed Alkasadi Student, Institute
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationChapter 5 Matrix Approach to Simple Linear Regression
STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationEconometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012
Econometric Methods Prediction / Violation of A-Assumptions Burcu Erdogan Universität Trier WS 2011/2012 (Universität Trier) Econometric Methods 30.11.2011 1 / 42 Moving on to... 1 Prediction 2 Violation
More informationDiagnostics for Linear Models With Functional Responses
Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationTMA4267 Linear Statistical Models V2017 (L10)
TMA4267 Linear Statistical Models V2017 (L10) Part 2: Linear regression: Parameter estimation [F:3.2], Properties of residuals and distribution of estimator for error variance Confidence interval and hypothesis
More informationLinear Regression Models
Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58
ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.
More informationCointegration Lecture I: Introduction
1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationA. H. Abuzaid a, A. G. Hussin b & I. B. Mohamed a a Institute of Mathematical Sciences, University of Malaya, 50603,
This article was downloaded by: [University of Malaya] On: 09 June 2013, At: 19:32 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
More informationPattern correlation matrices and their properties
Linear Algebra and its Applications 327 (2001) 105 114 www.elsevier.com/locate/laa Pattern correlation matrices and their properties Andrew L. Rukhin Department of Mathematics and Statistics, University
More informationPackage svydiags. June 4, 2015
Type Package Package svydiags June 4, 2015 Title Linear Regression Model Diagnostics for Survey Data Version 0.1 Date 2015-01-21 Author Richard Valliant Maintainer Richard Valliant Description
More informationMa 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA
Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4
More informationApplied linear statistical models: An overview
Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationResponse surface designs using the generalized variance inflation factors
STATISTICS RESEARCH ARTICLE Response surface designs using the generalized variance inflation factors Diarmuid O Driscoll and Donald E Ramirez 2 * Received: 22 December 204 Accepted: 5 May 205 Published:
More informationGeneralized Linear Models: An Introduction
Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationLECTURE 5 HYPOTHESIS TESTING
October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationThe Effect of a Single Point on Correlation and Slope
Rochester Institute of Technology RIT Scholar Works Articles 1990 The Effect of a Single Point on Correlation and Slope David L. Farnsworth Rochester Institute of Technology This work is licensed under
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More informationRegression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,
Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship
More informationBeam Example: Identifying Influential Observations using the Hat Matrix
Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the
More informationToutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates
Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/
More informationMATH Notebook 4 Spring 2018
MATH448001 Notebook 4 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH448001 Notebook 4 3 4.1 Simple Linear Model.................................
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationGARCH Models Estimation and Inference
GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in
More informationBasic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:
8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions
More informationSingular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model
International Journal of Statistics and Applications 04, 4(): 4-33 DOI: 0.593/j.statistics.04040.07 Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model
More informationMLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More information8. Hypothesis Testing
FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationThe regression model with one fixed regressor cont d
The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8
More informationGeneral Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35
General Linear Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 35 Suppose the NTGMM holds so that y = Xβ + ε, where ε N(0, σ 2 I). opyright
More informationLINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.
LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationReference: Davidson and MacKinnon Ch 2. In particular page
RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy
More information