Introduction to Regression

Size: px
Start display at page:

Download "Introduction to Regression"

Transcription

1 PSU Summer School in Statistics Regression Slide 1 Introduction to Regression Chad M. Schafer Department of Statistics & Data Science, Carnegie Mellon University cschafer@cmu.edu June 2018

2 PSU Summer School in Statistics Regression Slide 2 Outline Motivation Linear Models Nonparametric Regression The Curse of Dimensionality Additive Models

3 PSU Summer School in Statistics Regression Slide 3 Motivation The objective is to construct a model that relates the response variable Y to the predictor variables X 1, X 2,..., X p. Example 1: Predicting redshift from photometry Y X 1, X 2, X 3, X 4, X 5 = redshift = magnitudes in u, g, r, i, z bands Example 2: Modelling relationship between distance modulus and redshift for Type Ia Supernovae Y X 1 = distance modulus = redshift

4 PSU Summer School in Statistics Regression Slide 4 Distance Modulus Redshift From Riess, et al. (2007), 182 Type Ia Supernovae

5 PSU Summer School in Statistics Regression Slide 5 The ΛCDM model proposes a simple model for the relationship between redshift (z) and distance modulus: Distance Modulus = 5 log 10 c(1 + z) H 0 z 0 ( Ωm (1 + u) 3 + (1 Ω m ) ) 1/ This depends on two cosmological parameters, Ω m and H 0. Hence learning this relationship can help constrain these parameters.

6 PSU Summer School in Statistics Regression Slide 6 Distance Modulus Redshift The case where H 0 = and Ω m = 0.34.

7 PSU Summer School in Statistics Regression Slide 7 The two above examples contrast two of the typical motivations for using regression: In Example 1 we want to make predictions for new/future response values (redshifts) given the ugriz magnitudes of an object. Very important problem for LSST and other photometric surveys. In Example 2 we want to learn something about the relationship between redshift and distance modulus for Type Ia supernovae, since different physical theories make different predictions for its shape. Which of the two scenarios we are in will dictate our level of concern for model fit and violations of other theoretical assumptions.

8 PSU Summer School in Statistics Regression Slide 8 Another Example: Modelling Emission Lines Here we seek to model the relationships between the log equivalent widths of different emission lines of galaxies. This was motivated by an effort to improve simulation models for galaxy spectra. The sample consists of 13,788 SDSS galaxies, and can be read in via the following R command: > eldata = read.table( + " + header=t) The use of header=t specifies that the first line of the file includes the variable names. Those can now be seen via > names(eldata) [1] "Halpha" "Hbeta" "SII" "Hgamma" "OII" "OIII" "NII"

9 PSU Summer School in Statistics Regression Slide 9 Create a scatter plot to consider one of the relationships between the lines: > plot(eldata$sii,eldata$oii, xlab="sii Log Equivalent Width", + ylab="oii Log Equivalent Width", pch=".", col="blue") OII Log Equivalent Width SII Log Equivalent Width

10 PSU Summer School in Statistics Regression Slide 10 The plot shows the least squares regression line. Our assessment of this line would depend, in part, on our motivation for fitting the regression model. If our objective was solely to make good predictions of the OII log equivalent width using the SII line, then it is not doing a bad job. On the other hand, we would be very reluctant to claim that the linear model is revealing some truth regarding the relationships between these emission lines. In what follows, we will consider a range of increasingly-complex models. The starting point will be linear regression.

11 PSU Summer School in Statistics Regression Slide 11 Linear Regression Linear regression models predict the response variable as a linear combination of the predictors, plus random scatter Y = β 0 + β 1 X 1 + β 2 X β p X p + ɛ If we want to stress that there is a sample of n observations being used, we can write the following: Y i = β 0 + β 1 X 1i + β 2 X 2i + + β p X pi + ɛ i, i = 1, 2,..., n. The available sample used for fitting the model is often referred to as the training sample.

12 PSU Summer School in Statistics Regression Slide 12 Despite their simplicity, linear models are utilized in a wide range of situations, and understanding them is crucial to exploring more advanced regression methods. Also, keep in mind that the predictors X i can be transformed versions of the avaiable variable(s). The response can also be transformed. These operations can create linear models out of nonlinear relationships. For example, this describes a cubic relationship between X and Y : Y = β 0 + β 1 X + β 2 X 2 + β 3 X 3 + ɛ, but it is a linear model since it is linear in the β parameters.

13 PSU Summer School in Statistics Regression Slide 13 The Irreducible Errors The ɛ i are sometimes called the irreducible errors, to suggest that they re ever-present, and not to be ignored: We do not seek to fit models that eliminate random scatter, as this would just lead to overfitting. It is almost always assumed that E(ɛ i ) = 0. If Var(ɛ i ) = σ 2 for all i, then the errors are said to be homoskedastic. Although homoskedasticity is a common assumption, it can be relaxed, and heteroskedastic errors can be handled. The irreducible errors for different observations are often assumed to be uncorrelated. Assuming that the ɛ i are iid Gaussian leads to some additional nice theoretical results, but it doesn t tend to be important in practical situations.

14 PSU Summer School in Statistics Regression Slide 14 Simple Linear Regression The simple linear regression model consists of a single predictor: Y i = β 0 + β 1 X i + ɛ i with the assumptions that the ɛ i are uncorrelated, mean zero random variables with variance σ 2. There are three parameters to be estimated: β 0, β 1, and σ 2. The standard (but not the only) approach to estimating β 0 and β 1 is to use least squares, i.e., to find the values of these parameters that minimizes n i=1 (Y i (β 0 + β 1 X i )) 2 These minimizers are usually denoted β 0 and β 1.

15 PSU Summer School in Statistics Regression Slide 15 Recall our example of predicting galaxy emission line widths. OII Log Equivalent Width SII Log Equivalent Width

16 PSU Summer School in Statistics Regression Slide 16 Consider one observation in the data set, Y i : OII Log Equivalent Width Y i SII Log Equivalent Width

17 PSU Summer School in Statistics Regression Slide 17 The prediction from the regression model is the fitted value Ŷi: OII Log Equivalent Width Y i Y i SII Log Equivalent Width

18 PSU Summer School in Statistics Regression Slide 18 The difference is the residual, ɛ i = Y i Ŷi: OII Log Equivalent Width Y i ε i Y i SII Log Equivalent Width The least squares regression line minimizes i ɛ 2 i.

19 PSU Summer School in Statistics Regression Slide 19 With a little calculus, it is not difficult to show that ( ni=1 Xi X ) ( Y i Y ) β 1 = ( ni=1 Xi X ) 2 and that β 0 = Y β 1 X.

20 PSU Summer School in Statistics Regression Slide 20 The Correlation Coefficient The (Pearson) correleation coefficient provides a measure of the strength of linear association between two variables. The (sample) covariance between two variables is Cov(X, Y ) = n ( Xi X ) ( Y i Y ) / (n 1) i=1 and the correlation coefficient is / r xy = Cov(X, Y ) s x s y where s denotes the sample standard deviation. Note that 1 r xy 1, and that r xy = ±1 indicates a perfect linear relationship.

21 PSU Summer School in Statistics Regression Slide 21 The figure below gives examples as to how the correlation coefficient expresses the strength of linear relationship in a scatter plot. Note that nonlinear relationships are not detected.

22 PSU Summer School in Statistics Regression Slide 22 It s not surprising that there is a close relationship between the correlation coefficient and simple linear regression. In fact, β 1 = Cov(X, Y ) s 2 x ( ) sy = r xy. s x It s worth noting that if X and Y have been rescaled to have mean of zero and standard deviation of one, then β 1 = r xy, and β 0 = 0.

23 PSU Summer School in Statistics Regression Slide 23 Linear Regression in R Linear models are fit easily in R using the function lm(): > ellm = lm(oii ~ SII, data=eldata) Here, OII is the response, and SII is the predictor. Additional predictors are easy to include: > ellm2 = lm(oii ~ Halpha + Hbeta + SII, data=eldata) The output provided by summary() shows a lot of the important information. See the following slide.

24 PSU Summer School in Statistics Regression Slide 24 > summary(ellm2) Call: lm(formula = OII ~ Halpha + Hbeta + SII, data = eldata) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** Halpha <2e-16 *** Hbeta <2e-16 *** SII <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1.365e+04 on 3 and DF, p-value: < 2.2e-16

25 PSU Summer School in Statistics Regression Slide 25 Some comments on the output of the preceding slide: 1. The estimates of the β parameters are in the Estimate column of the table. Their standard errors are in the second column. 2. The t value column of the table is simply the ratio of the estimate to its standard error. 3. The fourth column reports the p-value for the test of H 0 : β i = 0 versus H 1 : β i 0. These tests are overused. 4. The estimate of σ is reported as the Residual standard error, i.e., σ =

26 PSU Summer School in Statistics Regression Slide 26 The Coefficient of Determination The R output on the previous slides also reports the value of Multiple R-Squared to be This is sometimes called the coefficient of determination, and is defined as R 2 = 1 ( ) 2 i Yi Ŷi ( i Yi Y ) 2 This (overused) metric can be interpreted as the proportion of the variance in the response explained by the model. Unfortunately, R 2 is often used as a proxy for the quality of a model fit, even though it is not suited to assess that.

27 PSU Summer School in Statistics Regression Slide 27 Consider the four scatter plots below, Anscombe s Quartet (1973). In all four cases, the values of β 0, β 1, r xy, R 2, and so forth, are equal.

28 PSU Summer School in Statistics Regression Slide 28 Residuals as Diagnostic Tools Anscombe s Quartet makes it clear that we cannot rely upon purely numerical assessments (including R 2 ) to judge the quality of a model fit. Inspection of the residuals is an important diagnostic tool. The i th residual is an approximation to the i th irreducible error: ɛ i = Y i Ŷi = Y i ( β 0 + β 1 X 1i + + β p X pi ) Y i (β 0 + β 1 X 1i + + β p X pi ) = ɛ i If the model is a good fit, then the residuals should be random scatter, and lacking in any relationships with predictors, etc.

29 PSU Summer School in Statistics Regression Slide 29 The plot of residuals versus fitted values is standard. The curved shape to the plot below shows a lack of fit in our linear regression example. > plot(ellm2$fitted.values, ellm2$residuals, pch=".", + xlab="fitted Values", ylab="residuals") Residuals Fitted Values

30 PSU Summer School in Statistics Regression Slide 30 It is also common to look at the normal probability plot of the residuals. While normality of the ɛ i is not an important assumption for most applications, this plot can also help to highlight the observations with the most extreme residuals. > qqnorm(ellm2$residuals, pch=16, main="") > qqline(ellm2$residuals) Sample Quantiles Theoretical Quantiles

31 PSU Summer School in Statistics Regression Slide 31 Model Selection The process of model selection involves deciding which of the available predictors should be utilized in the model. Including too many predictors leads to overfitting, a situation in which a model fits better to the training sample than it would to external observations not used in the fitting. This is a big problem. Important: Increasing the number of predictors will necessarily decrease the sum of the squared residuals, and necessarily increase the value of R 2. 1 These metrics are not useful for model selection. 1 Ignoring some technical cases where the new predictor is a linear combination of the predictors already included.

32 PSU Summer School in Statistics Regression Slide 32 Leave-one-out cross-validation is an important tool for model selection. For each observation i, imagine fitting a model which leaves out that row of the data set. The model is fit on this reduced training set. The fit model is then used to predict the response for observation i. Denote this Ŷ( i). The difference between Y i and Ŷ(i) is a useful view of the performance of the model at prediciting the response value for new cases. We can accumulate these errors into a score: LOOCV Score = 1 n n ( 2 Yi Ŷ( i)) i=1 This score can be compared over models with different sets of predictors. Of course, models with a small LOOCV score are preferred.

33 PSU Summer School in Statistics Regression Slide 33 Fortunately, there is an amazing shortcut to calculating the LOOCV score. Assume that the observed response values Y i are placed into a vector Y, and the fitted values are placed in vector Ŷ. If there is a matrix L such that then Ŷ = LY Y i Ŷ( i) = ɛ i /(1 L ii ) where L ii is the i th diagonal entry of L. These are called the leverages. The generalized cross validation (GCV) score is found by replacing L ii by the average diagonal entry of L: Y i Ŷ( i) = ɛ i / (1 L ii ) ɛ i /(1 tr(l)/n)

34 PSU Summer School in Statistics Regression Slide 34 The R function hatvalues() returns the leverages: > levs = hatvalues(ellm) > loocv = mean((ellm$residuals/(1-levs))^2) > gcv = mean((ellm$residuals/(1-mean(levs)))^2) > print(c(loocv, gcv)) [1] Compare this with the score of the model with three predictors: > levs2 = hatvalues(ellm2) > loocv2 = mean((ellm2$residuals/(1-levs2))^2) > gcv2 = mean((ellm2$residuals/(1-mean(levs2)))^2) > print(c(loocv2, gcv2)) [1] Function bestglm() in package bestglm will automate the search for smallest LOOCV score.

35 PSU Summer School in Statistics Regression Slide 35 Confidence and Prediction Intervals As described above, these models are ultimately used for inference, i.e., drawing conclusions and making predictions. Any inference should be accompanied by an uncertainty measure. We should contrast two inference objectives: 1. When making predictions of the response for new X values, then uncertainty is best described by a prediction interval. This interval gives a range of values that we can claim, with some confidence, the true value of Y falls into. 2. When characterizing the relationship between the response and the X values, a confidence interval on the regression function is often of interest.

36 PSU Summer School in Statistics Regression Slide 36 In R, use the function predict(). The argument interval controls the type of interval (prediction or confidence) that is created. > predict(ellm, data.frame(sii=2), interval="prediction") fit lwr upr > predict(ellm, data.frame(sii=2), interval="confidence") fit lwr upr Note that for the same value of X, the 95% prediction interval is much wider than the confidence interval, since the 95% prediction interval make take into account the irreducible error. Rule of Thumb: For large n, 95% prediction intervals have a half-width which is approximately 2 σ.

37 PSU Summer School in Statistics Regression Slide 37 Nonparametric Regression The linear models considered thus far are parametric models, meaning that there is a fixed form for the relationship between the response and predictor(s), the only unknowns are real-valued parameters. We saw at the start of the lecture that parametric models come in nonlinear forms as well, e.g., Distance Modulus = 5 log 10 c(1 + z) H 0 z 0 ( Ωm (1 + u) 3 + (1 Ω m ) ) 1/2 + 25

38 PSU Summer School in Statistics Regression Slide 38 Distance Modulus Redshift The case where H 0 = and Ω m = 0.34.

39 PSU Summer School in Statistics Regression Slide 39 Data Estimate Assumptions In the parametric case, the contribution to the estimate from the assumptions is fixed.

40 PSU Summer School in Statistics Regression Slide 40 Data Estimate Assumptions h In the nonparametric case, the influence of the assumptions is controlled by a smoothing parameter h. The value of h can be chosen smaller with larger sample sizes.

41 PSU Summer School in Statistics Regression Slide 41 Distance Modulus Redshift A nonparametric estimate of the relationship.

42 PSU Summer School in Statistics Regression Slide 42 Local Linear Regression The figure on the previous page shows a nonparametric regression or nonparametric smooth of the relationship. There are many variants of nonparametric regression, but here we focus on local linear regression, a version of local polynomial regression. This approach has proven to be very powerful, and possesses many strong theoretical properties to justify its use. Briefly stated, this procedure works by fitting a sequence of linear models: Each is fit not to the entire data set, but to only data within a neighborhood of a target point. The size of this neighborhood is the smoothing parameter: Large neighborhoods yield a large degree of smoothing, while small neighborhoods result in minimal smoothing.

43 PSU Summer School in Statistics Regression Slide 43 Our model here is that we observe (x i, Y i ) for i = 1, 2,..., n and that Y i = f(x i ) + ɛ i where the ɛ i are iid with mean zero and variance σ 2. Assuming that the ɛ i are normal will lead to some nice properties, but this development does not require that assumption. In order to construct the local linear regression estimate of f( ), it is best to consider a sequence of steps for each fixed x 0 at which f( ) will be estimated.

44 PSU Summer School in Statistics Regression Slide 44 Step One: Fix the target point x 0. Our objective is to estimate the regression function at x 0. Y X

45 PSU Summer School in Statistics Regression Slide 45 Step Two: Create the neighborhood around x 0. A common way to choose the neighborhood size is to choose is large enough to capture proportion α of the data. This parameter α is often called the span. A typical choice is α 0.5. Y X

46 PSU Summer School in Statistics Regression Slide 46 Step Three: Weight the data in the neighborhood. Values of x which are close x 0 will receive a larger weight than those far from x 0. Denote by w i the weight placed on observation i. The default choice is the tri-cube weight function: w i = ( 1 x i x 0 max dist 3) 3, if xi in the neighborhood of x 0 0, if x i is not in neighborhood of x 0

47 PSU Summer School in Statistics Regression Slide 47 Y X

48 PSU Summer School in Statistics Regression Slide 48 Step Four: Fit the local regression line. This is done by finding β 0 and β 1 to minimize the weighted sum of squares n i=1 w i (y i (β 0 + β 1 x i )) 2 Y X

49 PSU Summer School in Statistics Regression Slide 49 Step Five: Estimate f(x 0 ). This is done using the fitted regression line to estimate the regression function at x 0 : f(x 0 ) = β 0 + β 1 x 0 Y X

50 PSU Summer School in Statistics Regression Slide 50 This is implemented in R using loess(): > holdlo = loess(oii ~ SII, data=eldata, degree=1, span=0.27) Setting degree = 1 is necessary in order to get local linear models. You can use the default degree = 2 to get local quadratic fits, but this is usually unnecessary. > plot(eldata$sii,eldata$oii, xlab="sii Log Equivalent Width", + ylab="oii Log Equivalent Width", pch=".", col="blue") > > SIIseq = seq(min(eldata$sii), max(eldata$sii), 0.1) > points(siiseq, predict(holdlo,siiseq), type="l", + lwd=2, col="red")

51 PSU Summer School in Statistics Regression Slide 51 OII Log Equivalent Width SII Log Equivalent Width

52 PSU Summer School in Statistics Regression Slide 52 It still makes sense to look at the plot of residuals versus fitted values. The improvement in the fit is noticeable. > plot(holdlo$fitted,holdlo$residuals, xlab="fitted Values", + ylab="residuals", pch=".") Residuals Fitted Values

53 PSU Summer School in Statistics Regression Slide 53 Smoothing Parameter Selection The smoothing parameter in nonparametric regression controls how smooth the resulting estimate is: A smaller value leads to a rougher estimate, a larger value gives a smoother estimate. The argument span to loess() controls the span, as defined above. The default value is The smoothing parameter cannot be chosen by minimizing the sum of squared residuals ( least squares ). Doing so would lead to overfitting, since for small enough choice, the residuals could all be made close to zero.

54 PSU Summer School in Statistics Regression Slide 54 Y Using span = Clearly not enough smoothing. X

55 PSU Summer School in Statistics Regression Slide 55 Y Using span = 0.9. Too much smoothing, missing important features. X

56 PSU Summer School in Statistics Regression Slide 56 As before, a common strategy for setting smoothing parameters is to use cross-validation. The LOOCV score can be calculated for different values of the smoothing parameter: LOOCV(h) = 1 n ( 2 Yi n Ŷ( i)) i=1 and the choice of h which minimizes the score is optimal. GCV is also often used. Function loess.as(), part of package fancova, includes the ability to search for the optimal choice of the span, as determined by GCV. > library(fancova) > optlo = loess.as(eldata$sii,eldata$oii, criterion="gcv") > print(optlo$pars$span) [1]

57 PSU Summer School in Statistics Regression Slide 57 Y X When the sample size is small, cross validation can be unreliable. Reconsider our example from above, with span = 0.16 chosen using GCV.

58 PSU Summer School in Statistics Regression Slide 58 Smoothing Splines Another nonparametric approach to regression is the penalized spline or smoothing spline. This approach starts with a natural optimization problem: Find the twice differentiable function f( ) such that n i=1 (y i f(x i )) 2 b [ + λ f (2) (x) ] 2 dx a where f (2) (x) indicates the second derivative of f evaluated at x and λ > 0 is the smoothing parameter. Typically, a = min{x i } and b = max{x i }.

59 PSU Summer School in Statistics Regression Slide 59 Note that the penalty term b a [ f (2) (x) ] 2 will be large if the function is wiggly. Of course, if f(x) = a + bx, then this penalty equals zero. Large values of λ lead to smooth functions f(). As before, λ can be chosen via cross-validation.

60 PSU Summer School in Statistics Regression Slide 60 This may appear to set up a difficult optimization problem, but, in fact, the search for the optimal f( ) can be transformed into constructing f( ) as the linear combination of a specially formed basis of functions. Comment: The number of basis functions utilized is equal to the number of knots plus four. For computational reasons, the number of knots is usually chosen much smaller than the number of data points.

61 PSU Summer School in Statistics Regression Slide 61 The figure below shows a simple example for the case where there are five knots (shown as the vertical lines) x

62 PSU Summer School in Statistics Regression Slide 62 In R, penalized splines are implemented via smooth.spline(). The default is for smooth.spline() to choose λ using GCV, but if you set cv = TRUE, then true LOOCV is used. See below for the result of the fit to our SDSS example. > holdspline = smooth.spline(eldata$sii,eldata$oii) > print(holdspline) Call: smooth.spline(x = eldata$sii, y = eldata$oii) Smoothing Parameter spar= lambda= (12 iterations) Equivalent Degrees of Freedom (Df): Penalized Criterion (RSS): GCV:

63 PSU Summer School in Statistics Regression Slide 63 OII Log Equivalent Width SII Log Equivalent Width

64 PSU Summer School in Statistics Regression Slide 64 Comment: The Use of Nonparametrics in Astronomy A common objection raised to using nonparametric regression in astronomy is the failure to obtain a functional form for the relationship between the variables. Although there are situations where this is a drawback, there are many other situations in which a researcher wants a model for either (1) making predictions (e.g., photometric redshifts), or (2) summarizing data. For instance, a nonparametric estimate of a luminosity function can be a useful means of summarizing a massive data set. The use of a nonparametric approach would provide some reassurance that the summary is not affected by our model choices. Letting the data speak for themselves more important as samples grow.

65 PSU Summer School in Statistics Regression Slide 65 The Curse of Dimensionality Despite the promise of nonparametric approaches, they suffer from a serious drawback: The amount of data required to fit nonparametric regression models well increases exponentially with the dimension of the data. This is called the Curse of Dimensionality. What is the dimension of data? Standard data come to us in the form of numbers, but a single observation is often best thought of as a vector. Examples of high-dimensional data in astronomy include images, spectra, photometry, light curves, etc. measured from a single object.

66 PSU Summer School in Statistics Regression Slide 66 If there are p predictors being used in a model for a response variable, we would say that we are using a p-dimensional predictor vector. In some applications, p can be very large. Nonparametric regression is possible, in theory. For instance, if p = 2, a neighborhood can be thought of as a square centered on the target value. Local linear regression involves fitting a plane within this square. But, as the dimension increases, the available observations become more spread out. Neighborhoods must be made larger in order to compensate and achieve estimates with acceptable variance. Choosing a large neighborhood takes away a key feature of using nonparametric approaches: We want the estimate to be flexible, and not be based on too much smoothing.

67 PSU Summer School in Statistics Regression Slide 67 Consider the plot on the following slide. Here, the sample size is 100. Each observation is two-dimensional, i.e., imagine these are the two predictors. These data are simulated from uniform distributions. The marks shown in each margin depict the marginal distribution for each predictor. Note that in the margins, the data appear to be quite closely spaced. If we were to fit a nonparametric regression using just one predictor, neighborhoods could be chosen small. But, in two dimensions, large gaps appear in the distribution of the observations. Neighborhoods will have to be chosen larger in order to capture enough data.

68 PSU Summer School in Statistics Regression Slide Predictor Two Predictor One

69 PSU Summer School in Statistics Regression Slide 69 Another way of thinking about it: Under the uniform setup, with a p- dimensional predictor, in order for a neighborhood to include proportion α of the data, the length of the sides of the neighborhood must be α 1/p. This plot shows the case where α = 0.4. Length of Side p

70 PSU Summer School in Statistics Regression Slide 70 One approach to dealing with the curse of dimensionality is to utilize techniques of dimension reduction to map high-dimensional data into lowdimensional representations. Principal components analysis (PCA) is a classic method to dimension reduction, but more-sophisticated nonlinear approaches such as isomap, local linear embeddings, and diffusion map are available.

71 PSU Summer School in Statistics Regression Slide 71 Additive Models Another approach to dealing with the curse of dimensionality is to fit models which are nonparametric, but are additive in the predictors. Consider a case with p = 3 predictors. A fully nonparametric model would be of the form Y = f(x 1, X 2, X 3 ) + ɛ, i.e., the function f() could take any form on the three-dimensional predictor space. This model is very flexible, but difficult to fit well due to the curse of dimensionality. An additive nonparametric model would assume Y = β 0 + f 1 (X 1 ) + f 2 (X 2 ) + f 3 (X 3 ) + ɛ where each of the f i () are estimated nonparametrically.

72 PSU Summer School in Statistics Regression Slide 72 The general estimation strategy is called backfitting. In this process, each f k is estimated nonparametrically, in a rotation, and the process is repeated until there is convergence. When estimating f k ( ), the other f j ( ) are held fixed at their current best estimates, and we set up a one-dimensional nonparametric estimation problem, on which one could use either local linear regression, smoothing splines, or other approach. So, when estimating f k ( ), the response is taken to be Y i j k f j (x ij )

73 PSU Summer School in Statistics Regression Slide 73 Additive Nonparametric Regression in R There are two packages which include a function gam() used for fitting these models. In both cases, the syntax is much like when using lm(). The key difference is that one must specify which of the predictors are to be fit nonparametrically. This is done by wrapping the predictor with a special function. For the function gam() which is part of the package gam, there are two options: A function s() for fitting a smoothing spline, and a function lo() for fitting a local polynomial. For the function gam() which is part of the package mgcv, there is only the option s().

74 PSU Summer School in Statistics Regression Slide 74 In either case, the syntax would appear simply as > holdgam = gam(y ~ s(x1) + s(x2) + x3) In this example, the model being fit is Y = β 0 + f 1 (x 1 ) + f 2 (x 2 ) + β 3 x 3 + ɛ with f 1 and f 2 being estimated from the data using a smoothing spline. The advantage to using gam() from mgcv is that it incorporates automatic selection of the smoothing parameters for the individual smoothing splines.

75 PSU Summer School in Statistics Regression Slide 75 Back to the SDSS Data We can now fit an additive model relating the OII line strength to the other six lines, as follows: > library(mgcv) > holdgam = gam(oii ~ s(halpha) + s(hbeta) + s(sii) + s(hgamma) + + s(oiii) + s(nii), data = eldata) The estimates of the functions f i can be viewed: > plot(holdgam, pages=1, scale=0, scheme=1)

76 PSU Summer School in Statistics Regression Slide 76 s(halpha,6.02) s(hbeta,9) s(sii,8.13) Halpha Hbeta SII s(hgamma,7.17) s(oiii,8.86) s(nii,6.49) Hgamma OIII NII

77 PSU Summer School in Statistics Regression Slide 77 It again is important to look at the plot of residuals versus fitted values. There is no evidence of a lack of fit. > plot(holdgam$fitted,holdgam$residuals, xlab="fitted Values", + ylab="residuals", pch=".") Residuals Fitted Values

78 PSU Summer School in Statistics Regression Slide 78 Projection Pursuit Regression Recall our additive nonparametric regression model: Y = β 0 + p j=1 f j (x j ) + ɛ A generalization of this model is the projection pursuit regression model, which can be written as Y = β 0 + M ( β k f k α T k x ) + ɛ k=1 where each of the α k are a vector of length p. These α k are the projection direction vectors. The functions f k, called the ridge functions, are estimated nonparametrically. These functions are scaled to have mean zero and variance one when applied to the observed sample.

79 PSU Summer School in Statistics Regression Slide 79 The projection pursuit model takes linear combinations of the predictors, and then fits an additive model in these linear combinations. One can think of the α T k x as being new predictors which are being utilized in an additive model. Projection pursuit is implemented in R using the function ppr(). Projection pursuit is related to basic neural network models.

80 PSU Summer School in Statistics Regression Slide 80 Conclusion Motivation Linear Models Nonparametric Regression The Curse of Dimensionality Additive Models

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

STK4900/ Lecture 5. Program

STK4900/ Lecture 5. Program STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Introduction to Nonparametric Regression

Introduction to Nonparametric Regression Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

Lecture 9: Predictive Inference for the Simple Linear Model

Lecture 9: Predictive Inference for the Simple Linear Model See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 9: Predictive Inference for the Simple Linear Model 36-401, Fall 2015, Section B 29 September 2015 Contents 1 Confidence intervals

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs. 8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Alternatives. The D Operator

Alternatives. The D Operator Using Smoothness Alternatives Text: Chapter 5 Some disadvantages of basis expansions Discrete choice of number of basis functions additional variability. Non-hierarchical bases (eg B-splines) make life

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Generalized Additive Models (GAMs)

Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Exploiting Sparse Non-Linear Structure in Astronomical Data

Exploiting Sparse Non-Linear Structure in Astronomical Data Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

Accelerated Failure Time Models

Accelerated Failure Time Models Accelerated Failure Time Models Patrick Breheny October 12 Patrick Breheny University of Iowa Survival Data Analysis (BIOS 7210) 1 / 29 The AFT model framework Last time, we introduced the Weibull distribution

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Chapter 7: Simple linear regression

Chapter 7: Simple linear regression The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Linear Regression. Furthermore, it is simple.

Linear Regression. Furthermore, it is simple. Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Linear Modelling: Simple Regression

Linear Modelling: Simple Regression Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Multiple Regression and Regression Model Adequacy

Multiple Regression and Regression Model Adequacy Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K. An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Tutorial 6: Linear Regression

Tutorial 6: Linear Regression Tutorial 6: Linear Regression Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction to Simple Linear Regression................ 1 2 Parameter Estimation and Model

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

STAT 705 Nonlinear regression

STAT 705 Nonlinear regression STAT 705 Nonlinear regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 1 Chapter 13 Parametric nonlinear regression Throughout most

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information