The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects
|
|
- Lenard Tate
- 5 years ago
- Views:
Transcription
1 Int. J. Contemp. Math. Sciences, Vol. 3, 2008, no. 17, The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects Jung-Tsung Chiang Department of Business Administration Ling Tung University, Taiwan No. 1 Ling-Tung Rd. Taichung city, Taiwan jungtsung@mail2000.com.tw Abstract The Gentleman and Wilk s method for k outliers detection is to find the subgroup of n k observations which has the minimum sum of squared residuals (Min:SSE). The proposed method here modifies it as which has the minimum prediction error sum of squares (Min:PRESS). Next, A fast algorithm to find the best construction data set (containing all good observations) is based on the absolute Jackknife residuals at the first stage. An entire data set is divided into two groups. The first group is a clean one to compute the predicted function, and the other one, containing the outliers, is examined by ADP s (Absolute deviation of predictions). A simulation study and two famous examples are presented. Keywords: OLS method, Masking effect, Swamping effect, Jackknife residuals, Cook s Distance, Data Structure 1 Introduction The OLS method is widely used in linear models to detect outliers. Basically, the outliers in a sample reflect several features, which are from either (1) the errors of measurement or (2) intrinsic variability ( mean shift, inflation of variances or others). In case(1), the outliers should be excluded from the sample or need to be corrected. In case (2), more methods and works will be developed if possible. Several authors studied multiple outliers detection. Gentleman and Wilk (1975)[3] proposed the Deletion Method to identify a subset of k outliers. The observations deleted sequentially that produces the largest reduction in the residual sum of squares. This is also equivalent to finding a n k subset of original data set with a minimum of residual sum of squared errors (MSSE).
2 840 J. T. Chiang A problem arises in substantial computations that may not be available for data with large number of observations. So, Marasinghe (1985)[8] took a new test statistic, F k for detecting multiple outliers in linear regression models. This statistic is derived from a sequential testing procedure. Paul and Fung (1991)[10] studied a general extreme studentized residual (GESR) procedures. They control the type I error and take a two-phase procedure to identify the outliers with high leverage values. Hadi and Simonoff(1993)[5] offered the adjusted residuals as criteria for outliers detection. They introduced two algorithms for detecting multiple outliers. Both methods the approximated MSSE subsets were tested for the no-outlier hypothesis with a combination of single linkage clustering (Hartigan 1981)[6] and back-stepping ( Rosner 1975[11]; Simonff 1984a [13]) to avoid masking effect and swamping effect. Recently, Wei and Fung (1999)[16] proposed the mean-shift outlier model for general weighted regression. In this study, we try to give an explicit and clean definition for outliers based on a best construction data set and find a fast algorithm via datasplitting to identify outliers. A reasonable data-splitting is required on this topic. The problem arises in an effective factor space established to get the least squared equation in which the interpolation rule is attained. Besides, the ordinary least squared method (OLS) must be modified since the distributions of outliers are different from that of the entire clean data set. Next, the influential points have an impact on OLS predicted function. They may suffer from masking and swamping effects (See Chiang, 2007 )[2]. Now, We shall focus on the cases of mean shift expected values (Mean-shift Models) throughout the article. Meanwhile, two famous examples and a simulation study are presented here. 2 The Formulation of the Methods for Multiple Outliers We consider the full rank linear model: Y = Xβ + ɛ (1) where X is a known n p (n p + 1) full rank matrix. β is an unknown p 1 vector, and ɛ is an n 1 error vector with i.i.d as N(0,σ 2 I n ). So the least squares estimated residuals e and its variance are: e = Y Ŷ =(I H)Y and Var(e) =(I H)σ 2, where the hat matrix H = X(X X) 1 X is symmetric and idempotent. Now, we delete the ith observation and use the remaining n 1 observations to calculate the fitted value of the ith case, Ŷi(i). The difference between the observed value Y i and Ŷi(i) is called the deleted residuals of the
3 The algorithm for multiple outliers 841 ith case, denoted by d i = Y i Ŷi(i). And the Jackknife residual r i is defined as r i = e i s (i) 1 hii, (i =1, 2,..., n) (2) where s 2 (i) = Y (i) (I H (i))y (i), and h n p 1 ii = X i (X X) 1 X i, the ith diagonal element of H. The ith observation is deleted and indicated by writing the index (i) in brackets. If rank(x (i) )=p and ɛ are i.i.d with N(0,σ 2 I n ), then the Jackknife residuals ri, (i=1, 2,..., n) are t n p 1 distributed ( Beckman et al., 1974 [1]). Next, the prediction error sum of squares is defined as PRESS = n i=1 d 2 i, where d i = Y i Ŷi(i) is the deleted residuals of the ith case. Then PRESS = n i=1 e 2 i (1 h ii ) 2 (3) The PRESS illustrates that each of all observations can be regarded as a new one when it is deleted from the data set. It is equivalent to cross-validation criterion CV (1) (Stone, 1973 [14]). A model with small PRESS is considered well-fitted. So a modified criterion for identifying a single outlier is given by Min : PRESS Gi,i=1, 2,..., n (4) where G i is the subgroup without the ith observation. For a large sample, the PRESS and SSE are asymptotically equivalent(mcquarrie, 1999 [9]). Thus, the equation (4) is also equivalent to Grubbs test(grubbs, 1950 [4]). Here, we consider a data set M of size n containing k outliers, and the entire clean data set C with n c, (n c = n k), good observations is a subset of M. Now suppose that all good observations of the subset C are from the target distribution as the linear models, i.e., Y c N(X c β,σ 2 I n ). The OLS estimator ˆβ c =(X cx c ) 1 X cy c is invariant if k new observations (X nc+1, Ŷn c+1),..., (X nc+k, Ŷn c+k) are added to the data set C. That is, ˆβ c =(X X) 1 XỸ (5) where X is an n p matrix,ỹ =(Y 1,..., Y nc, Ŷn c+1,..., Ŷn k +1), and Ŷn c+j = X n ˆβ c+j c,j =1,..., k. The new data set B of (X, Ỹ ) is associated with the residuals: (e 1,e 2,..., e nc, 0,..., 0). Next, let the original date set M be expressed as: {(X 1,Y 1 ),..., (X nc,y nc ), (X nc+1, Ŷn c+1 + d nc+1),..., (X nc+k, Ŷn c+k + d nc+k)} where d j is greater than 2σZ 1 α/2, j = n c +1,..., n c + k,and the last k observations of data set M are outliers. Obviously, Y M = Ỹ +ΔY if the ΔY =(0,..., 0,d nc+1,..., d nc+k). The OLS residuals of the original data set M is e =(I H)Y M
4 842 J. T. Chiang =(I H)(Ỹ +ΔY ) =(I H)Ỹ +(I H)ΔY =(e 1,..., e nc, 0,..., 0) +ΔY HΔY =(e 1,..., e nc,d nc+1,..., d nc+k) HΔY = e HΔY where H is an n squared hat matrix. e is an n 1 vector calculated by the clean data set C. Consequently, the associated studentized residuals r i corresponding to e i are r i = e i n j=nc+1 h ij d j s 1 h ii,if i=1, 2,..., n c (6) And r i = d i n h j=nc+1 ijd j s,if i= n c +1,..., n c + k (7) 1 h ii which shows that (1) For i =1, 2,..., n c, there exist some r is are greater than critical points, the swamping effect appears on these cases; that is, they are incorrectly regarded as outliers. (2) For i = n c +1,..., n c + k, there exist some r is are less than the critical points, the masking effect appears on these cases; that is, they are incorrectly regarded as good observations (inliers). As argued above, the masking and swamping effects depend on the locations h ii, correlations h ij the signs of d i, and the permutations of these outliers. Basically, the k outliers can be regarded as a perturbation effects on a clean data set C. In the linear model Y = Xβ + ɛ N(Xβ,σ 2 ), we suppose that the real function Y = Xβ is known. Then the entire subset of k outliers satisfies the condition: k Max : ɛ 2 I j,j =1, 2,..., k (8) j=1 where I j is an arbitrary subset of a sample of size k. It is also equivalent to n Min : ɛ 2 i (9) i I j Basically, there is only one entire subset of k outliers in a sample satisfying the above definition, if the real function is known. Probably, there exist two deleted subsets, say, I 1, and I 2, such that SSE (I1 ) = SSE (I2 ) = Minimum of SSE (Ii ) Thus, a good choice to pick up I 1 or I 2 as an entire subset of k outliers is based on the smaller Tr(X X) 1, which is derived from a shorter length of
5 The algorithm for multiple outliers 843 the confidence interval of ˆβ c. An ideal criterion for a best clean data set is to minimize the the predictive sum of squared errors below: Min : PRESS Gi,i=1, 2,..., C(n, n k) (10) where G i is an arbitrary subgroup with n k observations. Now, if the outliers come from the mean shifts of k observations, then the mean-shift model is expressed as Y = Xβ + D + ɛ (11) where D =(0,..., 0,d n k+1,..., d n ), and d i > 2Z 1 α/2 σ. The {(X i,y i )} n i=n k+1 are k outliers with the mean shifts {d i } n i=n k+1. The best clean data set, denoted by C, is {(X i,y i )} nc i=1, where n c = n k or less. The Predictive Confidence Interval I p computed by the clean data set C is : X i ˆβ C α s c 1+X i (X c X c) 1 X i (12) where s c = MSE c, X =(X c,x i ). Since X i (X cx c ) 1 X i = 1 h ii (See p127 of Applied Linear Regression by Weisberg, 1985 [15]), the I p can be shown to X i ˆβ C αs c 1 hii (13) where the approximate critical points C α = t(1 α/2n c,n c p 1) are calculated by the upper bound of the Bonferroni inequality and a t-distribution of SSE Jackknife residuals. If needed, an adjusted critical value could be t(1 PRESS α/2n c,n c p 1) instead. It depends on the data structure. So, the subset I of k outliers can be obtained by Y i / I p,i=1,..., n (14) Here, only i = n k +1,..., n will be correctly identified for the outlying cases. Besides, the C {i}, i I will form a new data set with only one outlier, and the absolute Jackknife residual ri of the observation is greater than the critical values C α. Note that the ideal subset C of good observations, the size n c = n k may be reduced to n k 1, n k 2 orn k 3. It depends on the subjective viewpoints of researchers on residuals plots and n c i=1 sign(e i ) 0 for a large sample of size. h ii 3 The Algorithm, Famous Examples, and Simulation Studies An approach to detection of outliers in linear model is available through data split. If an entire data set of size n contains k outliers, then all C(n, n
6 844 J. T. Chiang k) partitions of the entire data set are considered for construction data sets G i,( i =1, 2,..., C(n, n k)). A best construction data set with n k clean observations is based on: Min : PRESS Gi,i=1, 2,..., C(n, n k) However, these methods involve a deal of computational effort. algorithms of data-splitting is proposed below. So, a fast k = 1, one outlier A single outlier is easy to identify based on Maximum of Jackknife residuals. That is, leave-one-out procedure is the best way for it. Actually, the k is unknown, so we take another way of data-splitting to find the subset of k most likely outliers. Jackknife residuals ri and Cook s Distance D i are used to pick up the construction data set at first stage. ** Algorithm : k = unknown,?? Outliers Step 1: using all observations to calculate each Jackknife residual r i,(i = 1,..., n) of all observations. Step 2: Arrange the absolute r i,( r i ), in ascending order, r (1), r (2),..., r (n), where r (1) = min{ r i }n i=1, and r (n) = max{ r i }n i=1. Step 3: Choose the observations {r (i) }[0.qn] i=1,(0.8 q 0.95) as a construction data set C 1, and the remaining observations form a validation data set V 1. Step 4: Confirm the data set C 1 is a clean one based on the plot of Jackknife residuals. The maximum of absolute Jackknife residuals of C 1 is less than the critical points C α. Step 5: Calculate the ADP (absolute deviation of predictions) for n v observations ; that is, Y i,v1 Ŷi,C 1, i =1, 2,..., n v. Next, we remove some observations of V 1 to C 1 based on their smaller values of ADP s, ( try ADP s<c α MSEc ), and get a new construction data set C 2. The remaining observations is denoted by V 2. Thus, the new predicted value Ŷi,C 2 is computed with the new data C 2. Step 6: In a similar way, we repeat step 4 by removing the new observations with smaller ADP s to C 2, and get a new set C 3,..., and so on. Step 4 and Step 5 are used repeatedly to obtain the best construction data set of size n k. Thus, the remaining k observations are outliers since they do not belong to I p. Besides, if Cook s Distance D i replaces r(i) in step 1, step 2 and step 3, this
7 The algorithm for multiple outliers 845 may result in a slightly different outcomes. The relations between D i and ri is (n p)h ii (ri D i = )2 (15) p(1 h ii )(n p 1+(ri ) 2 ) which is derived from Cook s Distance, the relations of r and r. Some of these points will be elaborated in the following subsections. 3.1 Two Famous Examples (1) Hawkins, Bradu and Kass s Constructed Data The data set is reproduced by the above authors (See Table 1)[7], and the Table 2 and Table 3 are completed by Jackknife residuals and the related statistics (Fig 1 to Fig 3) to find the best construction subset. Actually, the data contains outliers at cases 1 10 but they are undetected owing to the masking effect. The Jackknife residuals of which are between 1.18 and And the good observations 11 an 12 are incorrectly identified as outliers owing to swamping effect. Their Jackknife residuals are 4.03 and 5.29, respectively. Besides, the OLS methods used for the entire data set fail to identify outliers except Hadi s algorithm. However, we propose a fast algorithm to find outliers listed below: Step 1: Use the original data C = {1, 2,..., 75}to calculate Jackknife residuals of all observations. Step 2: Delete the subset V 1 = {1, 2,..., 14, 44}, since their absolute Jackknife residuals are ranked at top 15 ones. The new construction data set C 1 = C V 1 is a clean one based on the plot of Jackknife residuals, and the corresponding predicted function is Ŷ = X X X3. Step 3: Compute the ADP s of V 1, i.e., { Y i,v1 Ŷi,C 1 },and move back the cases 11 14, 44 to C 1 owing to their values of ADP s less than C α MSEc1. Hence the construction data set C 2 = C 1 {11 14, 44} is newer one to obtain the new predicted function Ŷ = X X X3. Step 4: C 2 is a good clean data set based on the plot of residuals. (See Fig 4 ). Step 5: Examining the data sets C 3,C 4,..., C 10, there exists only one outlier for each subset based on the Jackknife residuals. It is worth noting that the observations 1 10 are significantly outlying, since the values of r are , ,..., much greater than the critical point C α. Thus we declare that observations 1 10 are outliers in the original data set. Similarly, we also analyzed Hadi and Simonoff data set with the satisfactory results. (2)Hadi and Simonoff Artificial Data
8 846 J. T. Chiang The data set is created by Hadi and Simonoff (1993)[5] (See Table 4). First, they picked two predictors X1 and X2 distributed as uniform (0, 15), and gave the regression function Y = X1+X2+ɛ,where ɛ N(0, 1). Secondly, let the observations 1 3 be added by the quantity 4 to form a new mean-shift model as Y = X1+X2+4+ɛ. The outliers 1 3 are unidentified due to the masking effects (See Table 5). The OLS method is a drawback for outliers detection here. The alternative methods, such as LMS and LTS (Rousseeuw, 1984)[12] fail to identify them. A modified method proposed by Hadi and Simonoff had detected the three outliers successfully but a little troublesome. In this example, we get a well-fitted regression function Ŷ = X X2 based on a clean data set, and the observations 1, 2, 3 are outliers. Basically, the masking effect appears in the entire data set, since the three outliers 1, 2, 3 are at the same side, and their leverage values are very close together. For detailed discussion, it refers to the section 2, and the mean-shift value of 4 corresponds to the situation of this example. 3.2 Simulation Studies for Samples with two and three Outliers Example 1: Y =1+X + ɛ, ɛ N(0, 1) The data set of size 51 is generated from the linear model Y = 1+X +ɛ, where X i U(0, 10) and ɛ N(0, 1). There are two outliers planted at observations of 1 and 20 by adding the quantity of 5 to the null data set (See Table 6). Example 2: Y =2+X1+X2+ɛ, ɛ N(0, 1) The data set of size 60 is generated from the linear model Y = 2+X1+X2+ɛ, where X 1i U(0, 10),X 2i U(0, 15) and ɛ N(0, 1). There are three outliers planted at observations of 1, 2, 3 by adding the quantity of 4.5 to the null data set (See Table 7). Still, we get a satisfactory result for outliers detection through the above simulation studies. 4 Conclusions The OLS method is widely used in linear regression models. The masking and swamping effects are still unavoidable if the data set contains a few high leverage points. In general, the data structure has provided much information on this topic. The Jackknife residuals and Cook s Distance are good indicators here. In section 2, we have shown that the locations of outliers, their signs of residuals and permutations of all outliers are the main factors for masking and swamping effects. Next, Gentleman and Wilk s method, i.e., Min : SSE Gi for multiple outliers can be modified to Min : PRESS Gi. In this study, a
9 The algorithm for multiple outliers 847 useful and fast algorithm to find multiple outliers is based on data-splitting and Jackknife residuals, and the two famous examples in section 3 illustrate it is much simpler than GESR, Multistage Procedure and Hadi s Algorithms. Meanwhile, the diagnosis of a single outlier in linear models can be extended to the multiple outliers detection, and hence the masking and swamping effects is not a problem. Acknowledgements The author would like to thank his advisor Dr. Kenny Ye, Prof. Nancy Mendell, and Prof. Hongshik Ahn for their assistances in modifying the paper. This paper is a part of his doctoral dissertation in August 2002, AMS Dept. of SUNY-Stony Brook, USA. References [1] R.J. Beckman and H.J. Trussell, The distribution of an arbitrary studentized residual and effects of updating in multiple regression, J. Amer. Statistic. Assn., 69(1974), [2] J.T. Chiang, The Masking and Swamping Effects Using the Planted Mean- Shift Outliers Models, Int. Journal of Contemp. Math. Sciences, Vol.2, 7(2007), [3] J.F. Gentleman and M.B. Wilk, Detecting Outliers:II Supplementing the direct analysis of residuals,biometrics, 31(1975a), [4] Grubbs and E. Frank, Sample Criteria for Testing Outlying Observations, Annals of Mathematical Statistics, 21(1950), [5] A.S Hadi and J.S Simonoff, Procedures for the Identification of Multiple Outlier in Linear Models, Journal of the American Statistical Association, 88(1993), Issue 424, [6] J.A. Hartigan, Consistency of Single Linkage for high-density Cluster, Journal of the American Statistical Association, 76(1981), [7] D.M. Hawkins, D. Bradu and G.V. Kass, Location of several outliers in multiple-regression data using elemental sets, Technometrics, 26(1984), [8] Marasinghe and G. Mervyn, A Multistage Procedure for Detecting Several Outliers in Linear Regression, Technometrics, 27 (1985),
10 848 J. T. Chiang [9] McQuarrie, D R Allan and Tsai, Chih-Ling, Regression and Times Series, Model Selection, World Scientific Publishing Co. Pte. Ltd.,1999. [10] S.R. Paul and Y. Fung, A Generalized Extreme Studentized Residual Multiple-Outlier-Detection Procedure in Linear Regression, Technometrics, 33(1991), [11] B.Rosner, On the Detection of Many Outliers, Technometrics, 17(1975), [12] P.J.Rousseeuw, Least median of squares regression. Journal of American Statistical Association, 79(1984), [13] J.S. Simonoff, The Calculation of Outlier Detection Statistics, Communications in Statistics, Part B-Simulation and Computation, 13(1984a), [14] M. Stone, Cross-validatory choice and Assessment of Statistical Predictions, Journal of the Royal Statistical Society, Ser. B, 36(1973), [15] Weisberg and Sanford, Applied Linear Regression, John Wiley and Sons, Inc., [16] W.H. Wei and W.K. Fung, The mean-shift outlier model in general weighted regression and its applications, Computational Statistics and Data Analysis, 30(1999),
11 The algorithm for multiple outliers 849 Table 1: Hawkins, Bradu and Kass s Constructed Data Obs Y X1 X2 X3 Obs Y X1 X2 X ,
12 850 J. T. Chiang Table 2: The Cook s Distance D i and related Statistics in Hawkins data. Obs D i r h ii CV R DFFITS *The observations of the top fifteen largest D i are 1 14, 43, a slightly different from that of the top fifteen largest ri, 1 14, 44. The fifteen cases are deleted at the first stage.
13 The algorithm for multiple outliers 851 Table 3: Obtaining the Best Construction Data Set step by step on Hawkin et al. data set V i Observations Max r of C i Construction data set C i V , (53) C 1 = C V 1 V (53) C 2 = C V 2 V (10) C 3 = C V 3 V 4 1-8, (9) C 4 = C V 4 V 5 1-7, 9, (8) C 5 = C V 5 V 6 1-6, (7) C 6 = C V 6 V 7 1-5, (6) C 7 = C V 7 V 8 1-4, (5) C 8 = C V 8 V 9 1-3, (4) C 9 = C V 9 V , (3) C 10 = C V 10 V 11 1, (2) C 11 = C V 11 V (1) C 12 = C V 12 V (12) C ( the original data set) *In the third column, the number in () presents the observation. The best construction of size 65 is C 2 = {11, 12,..., 75} here.
14 852 J. T. Chiang Table 4: Hadi and Simonoff s Artificial Data Obs Y X1 X
15 The algorithm for multiple outliers 853 Table 5: Deleting Subsets based on Jackknife residuals r Obs r r r r r r Outlier Yes Yes Yes No No No No No No No No No No No No No No No No No No No No No No *The best construction data set is C {1, 2, 3}, and Obs of 1, 2, 3 are outliers, computing the Hadi and Simonoff s data set.
16 854 J. T. Chiang Table 6: The Simulation Data Set with two outliers Obs Y X Obs Y X The sample of size 51 is generated by R-project, version of R1.4.1
17 The algorithm for multiple outliers 855 Table 7: The Simulation Data Set with three outliers Obs Y X1 X2 Obs Y X1 X The sample of size 60 is generated by R-project, version of R1.4.1
18 856 J. T. Chiang jack Index Figure 1: The scatter plot of Jackknife residuals of the Hawkins et al. data set shows that the linear model is not well-fitted for the entire one. *Several outliers make perturbation effects on this data set.
19 The algorithm for multiple outliers 857 Diagonal of hat matrix i Figure 2: Index plot of leverage measure of Hawkins et al. data set. *The horizontal line is at mean of h values, and the segments function joins pairs of points by a line. Obs 14 is a high leverage point.
20 858 J. T. Chiang The observations of larger values of Cook Distance Absolute Jackknife Residuals Cook s Distance Figure 3: The scatter plot of absolute Jackknife residuals vs. Cook s Distance using the Hawkins s data. (n p)h *Note that D i = ii (ri )2 p(1 h ii )(n p 1+(ri )2 ) is not a one to one correspondence if we map (h ii, ri ) to D i.
21 The algorithm for multiple outliers 859 Index plot of residuals Residuals Index Figure 4: The scatter plot of residuals shows that the linear model is well fitted for the subset C 2. Received: December 9, 2007
The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models
Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationIDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH
SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationOutlier detection and variable selection via difference based regression model and penalized regression
Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationReview: Second Half of Course Stat 704: Data Analysis I, Fall 2014
Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions
More informationA CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE
Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University
More informationPrediction Intervals in the Presence of Outliers
Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationKutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science
Contributions to Linear Regression Diagnostics using the Singular Value Decomposition: Measures to Identify Outlying Observations, Influential Observations and Collinearity in Multivariate Data Kutlwano
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationRegression Diagnostics for Survey Data
Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics
More informationDetection of Outliers in Regression Analysis by Information Criteria
Detection of Outliers in Regression Analysis by Information Criteria Seppo PynnÄonen, Department of Mathematics and Statistics, University of Vaasa, BOX 700, 65101 Vaasa, FINLAND, e-mail sjp@uwasa., home
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationDiagnostics for Linear Models With Functional Responses
Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationA Simple Plot for Model Assessment
A Simple Plot for Model Assessment David J. Olive Southern Illinois University September 16, 2005 Abstract Regression is the study of the conditional distribution y x of the response y given the predictors
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-2 STATS 202: Data mining and analysis Sergio Bacallado September 19, 2018 1 / 23 Announcements Starting next week, Julia Fukuyama
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationQuantitative Methods I: Regression diagnostics
Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationTwo Simple Resistant Regression Estimators
Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.
More informationPrediction Intervals for Regression Models
Southern Illinois University Carbondale OpenSIUC Articles and Preprints Department of Mathematics 3-2007 Prediction Intervals for Regression Models David J. Olive Southern Illinois University Carbondale,
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationConstruction and analysis of Es 2 efficient supersaturated designs
Construction and analysis of Es 2 efficient supersaturated designs Yufeng Liu a Shiling Ruan b Angela M. Dean b, a Department of Statistics and Operations Research, Carolina Center for Genome Sciences,
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationDetecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points
Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have
More informationImproved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics
Improved Feasible Solution Algorithms for High Breakdown Estimation Douglas M. Hawkins David J. Olive Department of Applied Statistics University of Minnesota St Paul, MN 55108 Abstract High breakdown
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationA Note on Visualizing Response Transformations in Regression
Southern Illinois University Carbondale OpenSIUC Articles and Preprints Department of Mathematics 11-2001 A Note on Visualizing Response Transformations in Regression R. Dennis Cook University of Minnesota
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationDetection of single influential points in OLS regression model building
Analytica Chimica Acta 439 (2001) 169 191 Tutorial Detection of single influential points in OLS regression model building Milan Meloun a,,jiří Militký b a Department of Analytical Chemistry, Faculty of
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationRegression diagnostics
Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and
More informationChapter 10 Building the Regression Model II: Diagnostics
Chapter 10 Building the Regression Model II: Diagnostics 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 41 10.1 Model Adequacy for a Predictor Variable-Added
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationLinear Regression Models
Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationThe Effect of a Single Point on Correlation and Slope
Rochester Institute of Technology RIT Scholar Works Articles 1990 The Effect of a Single Point on Correlation and Slope David L. Farnsworth Rochester Institute of Technology This work is licensed under
More informationLecture One: A Quick Review/Overview on Regular Linear Regression Models
Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationMulticollinearity and A Ridge Parameter Estimation Approach
Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com
More information((n r) 1) (r 1) ε 1 ε 2. X Z β+
Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der@virginia.edu http://www.math.virginia.edu/
More informationInference in Regression Analysis
Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More informationOn Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness
Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationRegression Diagnostics
Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2
More informationA Note on UMPI F Tests
A Note on UMPI F Tests Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico May 22, 2015 Abstract We examine the transformations necessary for establishing
More informationChapter 14. Linear least squares
Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationUsing Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems
Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationPrediction of Bike Rental using Model Reuse Strategy
Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationRegression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr
Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics
More informationBox-Cox Transformations
Box-Cox Transformations Revised: 10/10/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 5 Plot of Fitted Model... 6 MSE Comparison Plot... 8 MSE Comparison Table... 9 Skewness
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationPractical High Breakdown Regression
Practical High Breakdown Regression David J. Olive and Douglas M. Hawkins Southern Illinois University and University of Minnesota February 8, 2011 Abstract This paper shows that practical high breakdown
More informationApplied linear statistical models: An overview
Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description
More informationRemedial Measures for Multiple Linear Regression Models
Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationDiagnostic Procedures
Diagnostic Procedures Joseph W. McKean Western Michigan University Simon J. Sheather Texas A&M University Abstract Diagnostic procedures are used to check the quality of a fit of a model, to verify the
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationWiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.
Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More information