# Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108

Save this PDF as:

Size: px
Start display at page:

Download "Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108"

## Transcription

1 Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN Abstract This article, which is based on an Interface tutorial, presents an overview of regression graphics, along with an annotated bibliography. The intent is to discuss basic ideas and issues without delving into methodological or theoretical details, and to provide a guide to the literature. 1 Introduction One of the themes for this Interface conference is dimension reduction, which is also a leitmotif of statistics. For instance, starting with a sample z 1,..., z n from a univariate normal population with mean µ and variance 1, we know the sample mean z is sufficient for µ. This means that we can replace the original n-dimensional sample with the one-dimensional mean z without loss of information on µ. In the same spirit, dimension reduction without loss of information is a key theme of regression graphics. The goal is to reduce the dimension of the predictor vector x without loss of information on the regression. We call this sufficient dimension reduction, borrowing terminology from classical statistics. Sufficient dimension reduction leads naturally to sufficient summary plots which contain all of the information on the regression that is available from the sample. Many graphically oriented methods are based on the idea of dimension reduction, but few formally incorporate the notion of sufficient dimension reduction. For example, projection pursuit involves searching for interesting low dimensional projections by using a user-selected projection index that serves to quantify interestingness (Huber 1995). Such methods may produce interesting results in a largely exploratory context, but one may be left wondering what they have to do with overarching statistical issues like regression. We assume throughout that the scalar response y and the p 1 vector of predictors x have a joint distribution. This assumption is intended to focus the discussion and is not crucial This article was delivered in 1998, at the 30th symposium on the Interface between Statistics and Computer Science, and was subsequently published in the Proceedings of the 30th Interface. For further information on the Interface, visit for the fundamental ideas. The general goal of the regression is to infer about the conditional distribution of y x as far as possible with the available data: How does the conditional distribution of y x change with the value assumed by x? It is assumed that the data {y i,x i }, i = 1,...,n, are iid observations on (y,x) with finite moments. The notation U V means that the random vectors U and V are independent. Similarly, U V Z means that U and V are independent given any value for the random vector Z. Subspaces will be denoted by S, and S(B) means the subspace of R t spanned by the columns of the t u matrix B. Finally, P B denotes the projection operator for S(B) with respect to the usual inner product, and Q B = I P B. 2 The Central Subspace Let B denote a fixed p q, q p, matrix so that y x B T x (1) This statement is equivalent to saying that the distribution of y x is the same as that of y B T x for all values of x in its marginal sample space. It implies that the p 1 predictor vector x can be replaced by the q 1 predictor vector B T x without loss of regression information, and thus represents a potentially useful reduction in the dimension of the predictor vector. Such a B always exists, because (1) is trivially true when B = I p, the p p identity matrix. Statement (1) holds if and only if y x P S(B) x Thus, (1) is appropriately viewed as a statement about S(B), which is called a dimension-reduction subspace for the regression of y on x (Li 1991, Cook 1994a). The idea of a dimension-reduction subspace is useful because it represents a sufficient reduction in the dimension of the predictor vector. Clearly, knowledge of the smallest dimension-reduction subspace would be useful for parsimoniously characterizing how the distribution of y x changes with the value of x.

2 Let S y x denote the intersection of all dimension-reduction subspaces. While S y x is always a subspace, it is not necessarily a dimension-reduction subspace. Nevertheless, S y x is a dimension-reduction subspace under various reasonable conditions (Cook 1994a, 1996, 1998b). In this article, S y x is assumed to be a dimension-reduction subspace and, following Cook (1994b, 1996, 1998b), is called the central dimension-reduction subspace, or simply the central subspace. The dimension d = dim(s y x ) is the structural dimension of the regression; we will identify regressions as having 0D, 1D,...,dD structure. The central subspace, which is taken as the inferential object for the regression, is the smallest dimension-reduction subspace such that y x η T x, where the columns of the matrix η form a basis for the subspace. In effect, the central subspace is a super parameter that will be used to characterize the regression of y on x. If S y x were known, the minimal sufficient summary plot of y versus η T x could then be used to guide subsequent analysis. If an estimated basis ˆη of S y x were available then the summary plot of y versus ˆη T x could be used similarly. 2.1 Structural Dimension The simplest regressions have structural dimension d = 0, in which case y x and a histogram or density estimate based on y 1,...,y n is a minimal sufficient summary plot. The idea of 0D structure may seem to limiting to be of much practical value. But it can be quite important when using residuals in model checking. Letting r denote the residual from a fit of a target model in the population, the correctness of the model can be checked by looking for information in the data to contradict 0D structure for the regression of r on x. Sample residuals ˆr need to be used in practice. Many standard regression models have 1D structure. For example, letting ε be a random error with mean 0, variance 1 and ε x, the following additive-error models each have 1D structure provided β 0: y x = β 0 + β T x + σε (2) y x = µ(β T x) + σε (3) y x = µ(β T x) + σ(β T x)ε (4) y (λ) x = µ(β T x) + σ(β T x)ε (5) Model (2) is just the standard linear model. It has 1D structure because since y x β T x. Models (3) and (4) allow for a nonlinear mean function and a nonconstant standard deviation function, but still have 1D structure. Model (5) allows for a transformation of the response. Generalized linear models with link functions that correspond to those in (2) (4) also have 1D structure. In each case, the vector β spans the central subspace. Clearly, the class of regressions with 1D structure is quite large. In regressions with 1D structure and central subspace S y x (η), a 2D plot of y versus η T x is a minimal sufficient summary plot. In practice, S y x (η) would need to be estimated. The following two models each have 2D structure provided the nonzero vectors β 1 and β 2 are not collinear: y x = µ(β T 1 x, β T 2 x) + σε y x = µ(β T 1 x) + σ(βt 2 x)ε In this case, S y x (η) = S(β 1, β 2 ) and a 3D plot of y over S y x (η) is a sufficient summary plot. Again, an estimate of S y x would be required in practice. In effect, the structural dimension can be viewed as an index of the complexity of the regression. Regressions with 2D structure are usually more difficult to summarize than those with 1D structure, for example. 2.2 Issues There are now two general issues: how can we estimate the central subspace, including its dimension, and what might we gain in practice by doing so? We will consider the second issue first by using an example in the next section to contrast the results of a traditional analysis with those based on an estimate of the central subspace. Estimation methods are discussed in Section 4. 3 Naphthalene Data Box and Draper (1987, p. 392) describe the results of an experiment to investigate the effects of three process variables (predictors) in the vapor phase oxidation of naphthalene. The response variable y is the percentage mole conversion of naphthalene to naphthoquinone, and the three predictors are x 1 = logarithm of the air to naphthalene ratio (L/pg), x 2 = logarithm of contact time (seconds) and x 3 = bed temperature. There are 80 observations. 3.1 Traditional analysis Box and Draper suggested an analysis based on a full secondorder response surface model in the three predictors. In the ordinary least squares fit of this model, the magnitude of each of the 10 estimated regression coefficients is at least 4.9 times its standard error, indicating that the response surface may be quite complicated. It is not always easy to develop a good scientific understanding of such fits. In addition, the standard plot of residuals versus fitted values of Figure 1 shows curvature, so that a more complicated model may be necessary. Further analysis from this starting point may also be complicated by at least three influential observations that are

4 Y :36x 1 :48x 2 :8x 3 Figure 3: Estimated sufficient summary plot for the naphthalene data. Graphics: Sufficient summary plots & S y x Assumptions, Model & Data Estimation Diagnostics S r x Inference Fitted model Figure 4: Analysis paradigm incorporating central subspaces. ual from the fitted model in the lower right box, we can use the central subspace S r x for the regression of r on x to guide diagnostic considerations (Cook 1998b). 4 Estimating the Central Subspace There are both numerical and graphical methods for estimating subspaces of S y x (η). All methods presently require that the conditional expectation E(x η T x) is linear when p > 2. We will refer to this as the linearity condition. In addition, methods based on second moments work best when the conditional covariance matrix var(x η T x) is constant. Both the linearity condition and the constant variance condition apply to the marginal distribution of the predictors and do not involve the response. Hall and Li (1993) show that the linearity condition will hold to a reasonable approximation in many problems. In addition, these conditions might be induced by using predictor transformations and predictor weighting (Cook and Nachtsheim 1994). We next consider graphical estimates of the central subspace using a procedure called graphical regression (GREG). 4.1 Graphical Regression Two predictors With p = 2 predictors, S y x (η) can be estimated visually from a 3D plot with y on the vertical axis and (x 1, x 2 ) on the horizontal axes. Visual assessment of 0D can be based on the result that y x if and only if y b T x for all b R 2 (Cook 1996, 1998b). As the 3D plot of y versus (x 1, x 2 ) is rotated about its vertical axis, we see 2D plots of y versus all linear combinations b T x. Thus, if no clear dependence is seen in any 2D plot while rotating we can infer that y x. If dependence is seen in a 2D plot, then the structural dimension must be 1 or 2. Visual methods for deciding between 1D and 2D, and for estimating S y x when d = 1, were discussed in detail by Cook (1992a, 1994a, 1998b), Cook and Weisberg (1994a), and Cook and Wetzel (1993) More than two predictors The central subspace can be estimated when p > 2 by assessing the structural dimension in a series of 3D plots. Here is the general idea. Let x 1 and x 2 be two of the predictors, and collect all of the remaining predictors into the (p 2) 1 vector x 3. We seek to determine if there are constants c 1 and c 2 so that y x (c 1 x 1 + c 2 x 2,x 3 ) (6) If (6) is true and c 1 = c 2 = 0 then x 1 and x 2 can be deleted from the regression without loss of information, reducing the dimension of the predictor vector by 2. If (6) is true and either c 1 0 or c 2 0 then we can replace x 1 and x 2 with the single predictor x 12 = c 1 x 1 + c 2 x 2 without loss of information, reducing the dimension of the predictor by 1. In which case we would have a new regression problem with predictors (x 12,x 3 ). If (6) is false for all values of (c 1, c 2 ) then the structural dimension is at least 2, and we could select two other predictors to combine. This methodology was developed by Cook (1992a, 1994a, 1996, 1998b), Cook and Weisberg (1994a), and Cook and Wetzel (1993), where it is shown that various 3D plots, including 3D added-variable plots, can be used to study the possibilities associated with (6) and to estimate (c 1, c 2 ) visually. Details are available from Cook (1998b). The summary plot of Figure 3 was produced by this means (Cook 1998b, Section 9.1). It required a visual analysis of 2 3D plots and took only a minute or two to construct. Graphical regression is not time intensive with a modest number of predictors. Even with many predictors, it often takes less time

5 than more traditional methods which may require that a battery of numerical and graphical diagnostics be applied after each fit. Graphical regression has the advantage that nothing is hidden since all decisions are based only on visual inspection. In effect, diagnostic considerations are an integral part of the graphical regression process. Outliers and influential observations are usually easy to identify, for example. 4.2 Numerical methods Several numerical methods are available for estimating the central subspace. We begin with standard methods and then turn to more recent developments Standard fitting methods and residual plots. Consider summarizing the regression by minimizing a convex objective function of a linear kernel a + b T x: (â,ˆb) = arg min a,b 1 n n L(a + b T x i, y i ) (7) i=1 where L(, ) is a convex function of its first argument. This class of objective functions includes most of the usual methods including ordinary least squares with L = (y i a b T x i ) 2, Huber s M-estimate, and various maximum likelihood estimates based on generalized linear models. This setup is being used only as a summarization device and does not imply the assumption of a model. Next, let (α, β) = arg min a,b E(L(a + bt x, y)) denote the population version of (â,ˆb). Then under the linearity condition, β is in the central subspace (Li and Duan 1989, Cook 1994a). Under some regularity conditions, ˆb converges to β at the usual root-n rate. There are many important implications of this result. First, if dim(s y x ) = 1 and the linearity condition holds then a plot of y versus ˆb T x is an estimated sufficient summary plot, and is often all that is needed to carry on the analysis. Second, it is probably a reason for the success of standard method like ordinary least squares. Third, the result can be used to facilitate other graphical methods. For example, it can be used to facilitate the analysis of the 3D plots encountered during a GREG analysis. Finally, under the linearity condition, this result can be used to show that S r x S y x (Cook 1994a, 1998b). Consequently, the analysis of residual plots provides information that is directly relevant to the regression of y on x. On the other hand, if the linearity condition does not hold it is probable that S r x S y x and thus that a residual analysis will be complicated by irrelevant information (the part of S r x that is not in S y x ). For example, if dim(s y x ) = 1 and the linearity condition fails then we may well have dim(s r x ) = 2. In this case the residual regression is intrinsically more complicated than the original regression and a residual analysis may do more harm than good. The objective function (7) can also be used with quadratic kernels a+b T x+x T Bx where B is a p p symmetric matrix of quadratic coefficients. When the predictors are normally distributed, S( ˆB) converges to a subspace that is contained in the central subspace (Cook 1992b; 1998b, Section 8.2) Inverse regression methods In addition to the standard fitting methods discussed in the previous section, there are at least three other numerical methods that can be used to estimate subspaces of the central subspace: sliced inverse regression, SIR (Li 1991), principal Hessian directions, phd (Li 1992, Cook 1998a), and sliced average variance estimation, SAVE (Cook and Weisberg 1991, Cook and Lee 1998). All require the linearity condition. The constant variance condition facilitates analyses with all three methods, but is not essential for progress. Without the constant variance condition, the subspaces estimated will likely still be dimensionreduction subspaces, but they may not be central. All of these methods can produce useful results, but recent work (Cook and Lee 1998) indicates that SAVE is the most comprehensive. SAVE, SIR, phd, the standard fitting methods mentioned in the previous section and other methods not discussed here are all based on the following general procedure. First, let z = Σ 1/2 (x E(x)) denote the standardized version of x, where Σ = var(x). Then S y x = Σ 1/2 S y z. Thus there is no loss of generality working on the z scale because any basis for S y z can be back-transformed to a basis for S y x. Suppose we can find a consistent estimate ˆM of a p k procedure-specific population matrix M with the property that S(M) S y z. Then inference on at least a part of S y z can be based on ˆM. SAVE, SIR, phd and other numerical procedures differ on M and the method of estimating it. For example, for the ordinary least squares option of (7), set k = 1 and M = cov(y,z). Let ŝ 1 ŝ 2... ŝ q denote the singular values of ˆM, and let û 1,...,û q denote the corresponding left singular vectors, where q = min(p, k). Assuming that d = S(M) is known, Ŝ(M) = S(û 1,..., û d )

6 is a consistent estimate of S(M). For use in practice, d will typically need to be replaced with an estimate ˆd based on inferring about which of the singular values ŝ j is estimating 0, or on using GREG to infer about the regression of y on the q linearly transformed predictors û T j z, j = 1,..., q. References [1] Box, G. E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness (with discussion). Journal of the Royal Statistical Society, Ser. A, 142, [2] Box, G. E. P. and Draper, N. (1987). Empirical Model Building and Response Surfaces. New York: Wiley. [3] Bura, E. (1997). Dimension reduction via parametric inverse regression. In Y. Dodge (ed.), L1 Statistical Procedures and Related topics: IMS Lecture Notes, Vol 31, : Investigates a parametric version of SIR [30]. [4] Chen, C-H, and Li, K. C. (1998). Can SIR be as popular as multiple linear regression? Statistica Sinica, 8, : Explores applied aspects of SIR [30], including standard errors for SIR directions. [5] Cheng, C-S. and Li, K. C. (1995). A study of the method of principal Hessian directions for analysis of data from designed experiments. Statistica Sinica, 5, [6] Chiaromonte, F and Cook, R. D. (1997). On foundations of regression graphics. University of Minnesota, School of Statistics Technical Report 616: Consolidates and extends a theory for regression graphics, including graphical regression (GREG), based on dimension-reduction subspaces. Report is available from [7] Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19, [8] Cook, R. D. (1992a). Graphical regression. In Dodge, Y. and Whittaker, J. (eds), Computational Statistics, Vol. 1. New York: Springer-Verlag, 11 22: Contains introductory ideas for graphical regression (GREG) which is a paradigm for a complete regression analysis based on visual estimation of dimension-reduction subspaces. [9] Cook, R. D. (1992b). Regression plotting based on quadratic predictors. In Dodge, Y. (ed), L1-Statistical Analysis and Related Methods. New York: North- Holland, : Proposes numerical estimates of the central subspace based on fitting quadratic kernels. [10] Cook, R. D. (1993). Exploring partial residual plots. Technometrics, 35, : Uses selected foundations of regression graphics developed in [11] to reinterpret and extend partial residual plots, leading to a new class of plots called CERES plots. [11] Cook, R. D. (1994a). On the interpretation of regression plots. Journal of the American Statistical Association, 89, : Basic ideas for regression graphics, including minimum dimension-reduction subspaces, the interpretation of 3D plots, GREG, direct sums of subspaces for constructing a sufficient summary plot; extends [8]. [12] Cook, R. D. (1994b). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In 1994 Proceedings of the Section on Physical Engineering Sciences, Washington: American Statistical Association: Case study of graphics in a regression with 80 predictors using GREG [8, 11], SIR [30], phd [31], CERES plots [10]; introduction of central dimension-reduction subspaces. [13] Cook, R. D. (1995). Graphics for studying the net of regression predictors. Statistica Sinica, 5, : Net effect plots for studying the contribution of a predictor to a regression problem; generalizes added-variable plots. [14] Cook, R. D. (1996). Graphics for regressions with a binary response. Journal of the American Statistical Association 91, : Adapts the foundations of regression graphics, including GREG [8, 11], and develops graphical methodology for regressions with binary response variables; includes results on the existence of central dimension-reduction subspaces. [15] Cook, R. D. (1998a). Principal Hessian directions revisited (with discussion). Journal of the American Statistical Association, 93, : Rephrases the foundations for phd [31] in terms of central dimension-reduction subspaces and proposes modifications of the methodology. [16] Cook, R. D. (1998b). Regression Graphics: Ideas for studying regressions thru graphics. New York: Wiley. (to appear in the Fall 1998). This is a comprehensive treatment of regression graphics suitable for Ph.D. students and others who may want theory along with methodology and application. [17] Cook, R. D. and Croos-Dabrera, R. (1998). Partial residual plots in generalized linear models. Journal of the American Statistical Association, 93, : Extents [10] to generalized linear models.

7 [18] Cook, R. D. and Lee, H. (1998). Dimension reduction in regressions with a binary response. Submitted: Develops and compares dimensions-reduction methodology SIR [30] phd [31], SAVE [21] and a new method based on the difference of the conditional covariance matrices of the predictors given the response in regressions with a binary response. [19] Cook, R. D. and Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association, 89, : Methodology for weighting the observations so the predictors satisfy the linearity condition needed to ensure that certain estimates fall in the central subspace asymptotically. [20] Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall. [21] Cook, R. D. and Weisberg, S. (1991). Discussion of Li [30]. Journal of the American Statistical Association, 86, : Introduction of sliced average variance estimation, SAVE, that can be used to estimate the central dimension-reduction subspace. [22] Cook, R. D. and Weisberg, S. (1994a). An Introduction to Regression Graphics. New York: Wiley. This book integrates selected graphical ideas at the introductory level. It comes with the R-code, a computer program for regression graphics written in LISP-STAT [34]. The book serves as the manual for R-code which works under Macintosh, Windows and Unix systems. See and follow Projects & Papers. [23] Cook, R. D. and Weisberg, S. (1994b). Transforming a response variable for linearity. Biometrika, 81, : Proposes inverse response plots as a means of estimating response transformations graphically. [24] Cook, R. D. and Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. Journal of the American Statistical Association, 92, : Proposes general graphical methods for investigating the adequacy of almost any regression model, including linear regression, nonlinear regression, generalized linear models and additive models. American Statistical Association, 93, : Introduces model selection methods for estimating in the context of SIR. [27] Hall, P. and Li, K. C. (1993). On almost linearity of low dimensional projections from high dimensional data. The Annals of Statistics, 21, : Shows that the linearity condition will be reasonable in many problems. [28] Hsing, T. and Carroll (1992). An asymptotic theory of sliced inverse regression. Annals of Statistics, 20, : Develops the asymptotic theory for the twoslice version of SIR [30]. [29] Huber, P. (1985). Projection pursuit (with discussion). Annals of Statistics, 13, [30] Li, K. C. (1991). Sliced inverse regression for dimension reduction (with discussion). Journal of the American Statistical Association, 86, : Basis for sliced inverse regression, SIR. [31] Li, K. C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein s lemma. Journal of the American Statistical Association, 87, : Basis for phd. [32] Li, K. C. (1997). Nonlinear confounding in highdimensional regression. Annals of Statistics, 25, [33] Li, K. C. and Duan, N. (1989). Regression analysis under link violation. Annals of Statistics, 17, : Shows conditions under which usual estimation methods (OLS, M-estimates,..) produce estimates that fall in a one-dimensional dimension-reduction subspace asymptotically. [34] Tierney, L. (1990). LISP-STAT. New York: Wiley: Resource for the parent statistical language of the R-code. [35] Zhu, L. X. and Fang, K. T. (1996). Asymptotics for kernel estimates of sliced inverse regression. Annals of Statistics, 24, : Uses smoothing rather than slicing to estimate the kernel matrix for SIR [30]. [36] Zhu, L. X. and Ng, K. W. (1995). Asymptotics of sliced inverse regression. Statistica Sinica, 5, [25] Cook, R. D. and Wetzel, N. (1993). Exploring regression structure with graphics (Invited with discussion). TEST, 2,1 57: Develops practical aspects of GREG. [26] Ferr, L. (1998). Determining the dimension of sliced inverse regression and related methods. Journal of the

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

### Marginal tests with sliced average variance estimation

Biometrika Advance Access published February 28, 2007 Biometrika (2007), pp. 1 12 2007 Biometrika Trust Printed in Great Britain doi:10.1093/biomet/asm021 Marginal tests with sliced average variance estimation

### 9 Correlation and Regression

9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

### Sufficient dimension reduction via distance covariance

Sufficient dimension reduction via distance covariance Wenhui Sheng Xiangrong Yin University of Georgia July 17, 2013 Outline 1 Sufficient dimension reduction 2 The model 3 Distance covariance 4 Methodology

### A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

### Chapter 2: simple regression model

Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

### Regression Analysis for Data Containing Outliers and High Leverage Points

Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

### i=1 h n (ˆθ n ) = 0. (2)

Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

### Review of Statistics

Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

### The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

### Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

### Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

### Shrinkage Inverse Regression Estimation for Model Free Variable Selection

Shrinkage Inverse Regression Estimation for Model Free Variable Selection Howard D. Bondell and Lexin Li 1 North Carolina State University, Raleigh, USA Summary. The family of inverse regression estimators

### Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

### Linear Regression. Junhui Qian. October 27, 2014

Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

### Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

### CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;

CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.

### ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

### Least Squares Optimization

Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

### FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

FACTOR ANALYSIS AS MATRIX DECOMPOSITION JAN DE LEEUW ABSTRACT. Meet the abstract. This is the abstract. 1. INTRODUCTION Suppose we have n measurements on each of taking m variables. Collect these measurements

### Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

### Least Squares Optimization

Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

### Fused estimators of the central subspace in sufficient dimension reduction

Fused estimators of the central subspace in sufficient dimension reduction R. Dennis Cook and Xin Zhang Abstract When studying the regression of a univariate variable Y on a vector x of predictors, most

### component risk analysis

273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain

### DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

### , (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

### Outline of GLMs. Definitions

Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

### Common Core State Standards with California Additions 1 Standards Map. Algebra I

Common Core State s with California Additions 1 s Map Algebra I *Indicates a modeling standard linking mathematics to everyday life, work, and decision-making N-RN 1. N-RN 2. Publisher Language 2 Primary

### Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

### A Introduction to Matrix Algebra and the Multivariate Normal Distribution

A Introduction to Matrix Algebra and the Multivariate Normal Distribution PRE 905: Multivariate Analysis Spring 2014 Lecture 6 PRE 905: Lecture 7 Matrix Algebra and the MVN Distribution Today s Class An

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

### Robustness of Principal Components

PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

### Variable Selection and Model Building

LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

### Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

### Factor Analysis (10/2/13)

STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

### Linear Regression Models

Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,

### * Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

### AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

### Econometrics - 30C00200

Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

### Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

### D-optimal Designs for Factorial Experiments under Generalized Linear Models

D-optimal Designs for Factorial Experiments under Generalized Linear Models Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Joint research with Abhyuday

### Tutorial 2: Power and Sample Size for the Paired Sample t-test

Tutorial 2: Power and Sample Size for the Paired Sample t-test Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,

### Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

### Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

### UNIVERSITY OF NORTH ALABAMA MA 110 FINITE MATHEMATICS

MA 110 FINITE MATHEMATICS Course Description. This course is intended to give an overview of topics in finite mathematics together with their applications and is taken primarily by students who are not

### STAT Checking Model Assumptions

STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

### Assumptions in Regression Modeling

Fall Semester, 2001 Statistics 621 Lecture 2 Robert Stine 1 Assumptions in Regression Modeling Preliminaries Preparing for class Read the casebook prior to class Pace in class is too fast to absorb without

### Multicollinearity and A Ridge Parameter Estimation Approach

Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

### Weighted Principal Support Vector Machines for Sufficient Dimension Reduction in Binary Classification

s for Sufficient Dimension Reduction in Binary Classification A joint work with Seung Jun Shin, Hao Helen Zhang and Yufeng Liu Outline Introduction 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction

### Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/

### Linear Algebra and Eigenproblems

Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

### Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing

### UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

### A Brief Introduction to Proofs

A Brief Introduction to Proofs William J. Turner October, 010 1 Introduction Proofs are perhaps the very heart of mathematics. Unlike the other sciences, mathematics adds a final step to the familiar scientific

### 10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

### IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH

SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti

### Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

### DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

### Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

### Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology Jeffrey R. Edwards University of North Carolina 1 Outline I. Types of Difference Scores II. Questions Difference

### 1 ** The performance objectives highlighted in italics have been identified as core to an Algebra II course.

Strand One: Number Sense and Operations Every student should understand and use all concepts and skills from the pervious grade levels. The standards are designed so that new learning builds on preceding

### The Bayesian Approach to Multi-equation Econometric Model Estimation

Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

### Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

204 Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model S. A. Ibrahim 1 ; W. B. Yahya 2 1 Department of Physical Sciences, Al-Hikmah University, Ilorin, Nigeria. e-mail:

### Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

### Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

### Using Mplus individual residual plots for. diagnostics and model evaluation in SEM

Using Mplus individual residual plots for diagnostics and model evaluation in SEM Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 20 October 31, 2017 1 Introduction A variety of plots are available

### Tangent spaces, normals and extrema

Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

### Dublin City Schools Mathematics Graded Course of Study Algebra I Philosophy

Philosophy The Dublin City Schools Mathematics Program is designed to set clear and consistent expectations in order to help support children with the development of mathematical understanding. We believe

### BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

### What to do if Assumptions are Violated?

What to do if Assumptions are Violated? Abandon simple linear regression for something else (usually more complicated). Some examples of alternative models: weighted least square appropriate model if the

### Math 110, Spring 2015: Midterm Solutions

Math 11, Spring 215: Midterm Solutions These are not intended as model answers ; in many cases far more explanation is provided than would be necessary to receive full credit. The goal here is to make

### Math 290, Midterm II-key

Math 290, Midterm II-key Name (Print): (first) Signature: (last) The following rules apply: There are a total of 20 points on this 50 minutes exam. This contains 7 pages (including this cover page) and

### Residuals and model diagnostics

Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

### Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

### The Bayes classifier

The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

### Regression Model Building

Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

### A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

### Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

### Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

### WA State Common Core Standards - Mathematics

Number & Quantity The Real Number System Extend the properties of exponents to rational exponents. 1. Explain how the definition of the meaning of rational exponents follows from extending the properties

### 11. Generalized Linear Models: An Introduction

Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

### Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

### Remedial Measures Wrap-Up and Transformations Box Cox

Remedial Measures Wrap-Up and Transformations Box Cox Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Last Class Graphical procedures for determining appropriateness

### Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

### FLORIDA STANDARDS TO BOOK CORRELATION

FLORIDA STANDARDS TO BOOK CORRELATION Florida Standards (MAFS.912) Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in

### Chapter 9. Correlation and Regression

Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

### Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

### x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

### Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

### Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

### II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity

### Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

### The General Linear Model (GLM)

he General Linear Model (GLM) Klaas Enno Stephan ranslational Neuromodeling Unit (NU) Institute for Biomedical Engineering University of Zurich & EH Zurich Wellcome rust Centre for Neuroimaging Institute

### A Primer in Econometric Theory

A Primer in Econometric Theory Lecture 1: Vector Spaces John Stachurski Lectures by Akshay Shanker May 5, 2017 1/104 Overview Linear algebra is an important foundation for mathematics and, in particular,