Overview of Topic III

Size: px

Start display at page:

Download "Overview of Topic III"

Lenard Porter
6 years ago
Views:

1 Overview of This topic will cover thinking in terms of matrices (Chapter 5) regression on multiple predictor variables (Chapter 6) case study: CS majors Text Example (KNNL 236) Page 1

2 Chapter 5: Linear Regression in Matrix Form The SLR Model in Scalar Form Y i = β 0 +β 1 X i +ǫ i where ǫ i iid N(0,σ 2 ) Consider now writing an equation for each observation: Y 1 = β 0 +β 1 X 1 +ǫ 1 Y 2 = β 0 +β 1 X 2 +ǫ 2... Y n = β 0 +β 1 X n +ǫ n Page 2

3 The SLR Model in Matrix Form Y 1 Y 2. = β 0 +β 1 X 1 β 0 +β 1 X 2. + ǫ 1 ǫ 2. Y n β 0 +β 1 X n ǫ n Y 1 Y 2. = 1 X 1 1 X 2.. [ β0 β 1 ] + ǫ 1 ǫ 2. Y n 1 X n ǫ n Page 3

4 (I will try to use bold symbols for matrices. At first, I will also indicate the dimensions as a subscript to the symbol.) X is called the design matrix. β is the vector of parameters ǫ is the error vector Y is the response vector Page 4

5 The Design Matrix X n 2 = 1 X 1 1 X X n Vector of Parameters [ β0 ] β 2 1 = β 1 Page 5

6 Vector of Error Terms ǫ n 1 = ǫ 1 ǫ 2. ǫ n Vector of Responses Y n 1 = Y 1 Y 2. Y n Page 6

7 Thus, Y = Xβ +ǫ Y n 1 = X n 2 β 2 1 +ǫ n 1 Page 7

8 Variance-Covariance Matrix In general, for any set of variablesu 1,U 2,...,U n, their variance-covariance matrix is defined to be σ 2 {U 1 } σ{u 1,U 2 } σ{u 1,U n } σ 2 σ{u 2,U 1 } σ 2. {U 2 }..... {U} = σ{u n 1,U n } σ{u n,u 1 } σ{u n,u n 1 } σ 2 {U n } whereσ 2 {U i } is the variance ofu i, andσ{u i,u j } is the covariance ofu i andu j. Page 8

9 When variables are uncorrelated, that means their covariance is 0. The variance-covariance matrix of uncorrelated variables will be a diagonal matrix, since all the covariances are 0. Note: Variables that are independent will also be uncorrelated. So when variables are correlated, they are automatically dependent. However, it is possible to have variances that are dependent but uncorrelated, since correlation only measures linear dependence. A nice thing about normally distributed RV s is that they are a convenient special case: if they are uncorrelated, they are also independent. Page 9

10 Covariance Matrix ofǫ σ 2 {ǫ} n n = Cov ǫ 1 ǫ 2. ǫ n = σ 2 I n n = σ σ σ 2 Page 10

11 Covariance Matrix ofy σ 2 {Y} n n = Cov Y 1 Y 2. = σ 2 I n n Y n Page 11

12 Distributional Assumptions in Matrix Form ǫ N(0,σ 2 I) I is ann nidentity matrix. Ones in the diagonal elements specify that the variance of eachǫ i is1timesσ 2. Zeros in the off-diagonal elements specify that the covariance between differentǫ i is zero. This implies that the correlations are zero. Page 12

13 Parameter Estimation Least Squares Residuals aree = Y Xb. Want to minimize sum of squared residuals. e 2 i = [e 1 e 2 e n ] e 1 e 2. e n = e e Page 13

14 We want to minimizee e = (Y Xb) (Y Xb), where the prime () denotes the transpose of the matrix (exchange the rows and columns). We take the derivative with respect to the vectorb. This is like a quadratic function: think (Y Xb) 2. The derivative works out to 2 times the derivative of(y Xb) with respect tob. d That is, db ((Y Xb) (Y Xb)) = 2X (Y Xb). We set this equal to0(a vector of zeros), and solve forb. So, 2X (Y Xb) = 0. Or,X Y = X Xb (the normal equations). Page 14

15 Normal Equations X Y = (X X)b Solving this equation forbgives the least squares solution for b = [ b0 b 1 ]. Multiply on the left by the inverse of the matrixx X. (Notice that the matrixx X is a2 2square matrix for SLR.) Page 15

16 b = (X X) 1 X Y REMEMBER THIS. Page 16

17 Reality Break: This is just to convince you that we have done nothing new nor magic all we are doing is writing the same old formulas forb 0 andb 1 in matrix format. Do NOT worry if you cannot reproduce the following algebra, but you SHOULD try to follow it so that you believe me that this is really not a new formula. Page 17

18 Recall in Topic 1, we had b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 b 0 = Ȳ b 1 X SS XY SS X Page 18

19 Now let s look at the pieces of the new formula: X X = X 1 X 2 X n 1 X 1 1 X 2.. = Xi n Xi X 2 i (X X) 1 = X Y = 1 n X 2 i ( X i ) X 1 X 2 X n X 2 i X i 1 X n Y 1 Y 2. X i n = = 1 nss X Yi Xi Y i X 2 i X i Y n Page 19

20 Plug these into the equation forb: b = (X X) 1 X Y = 1 X 2 i X i Yi nss X X i n Xi Y i 1 = ( Xi 2)( Y i ) ( X i )( X i Y i ) nss X ( X i )( Y i )+n X i Y i 1 = Ȳ( Xi 2) X X i Y i SS X Xi Y i n XȲ 1 = Ȳ( Xi 2) Ȳ(n X 2 )+ X(n XȲ) X X i Y i SS X SP XY 1 = ȲSS X SP XY X = Ȳ SP XY SS X X = b 0 SS X SP XY b 1 SP XY SS X, Page 20

21 where SS X = X 2 i n X 2 = (X i X) 2 SP XY = X i Y i n XȲ = (X i X)(Y i Ȳ) All we have done is to write the same old formulas forb 0 andb 1 in a fancy new format. See KNNL page 199 for details. Why have we bothered to do this? The cool part is that the same approach works for multiple regression. All we do is makexandbinto bigger matrices, and use exactly the same formula. Page 21

22 Other Quantities in Matrix Form Fitted Values Ŷ = Ŷ 1 Ŷ 2. = b 0 +b 1 X 1 b 0 +b 1 X 2. = 1 X 1 1 X 2.. [ b0 b 1 ] = Xb Ŷ n b 0 +b 1 X n 1 X n Page 22

23 Hat Matrix Ŷ = Xb Ŷ = X(X X) 1 X Y Ŷ = HY whereh = X(X X) 1 X. We call this the hat matrix because is turnsy s into Ŷ s. Page 23

24 Estimated Covariance Matrix ofb This matrixbis a linear combination of the elements of Y. These estimates are normal ify is normal. These estimates will be approximately normal in general. Page 24

25 A Useful Multivariate Theorem SupposeU N(µ,Σ), a multivariate normal vector, andv = c+du, a linear transformation ofuwherecis a vector anddis a matrix. Then V N(c+Dµ,DΣD ). Recall: b = (X X) 1 X Y = [ (X X) 1 X ] Y and Y N(Xβ,σ 2 I). Page 25

26 Now apply theorem tobusing U = Y,µ = Xβ,Σ = σ 2 I V = b,c = 0, andd = (X X) 1 X The theorem saysbis normally distributed with mean and covariance matrix (X X) 1 (X X)β = β σ 2( (X X) 1 X ) I ( (X X) 1 X ) = σ 2 (X X) 1 (X X) ( (X X) 1) = σ 2 (X X) 1 using the fact that bothx X and its inverse are symmetric, so ((X X) 1 ) = (X X) 1. Page 26

27 Next we will use this framework to do multiple regression where we have more than one explanatory variable (i.e., add another column to the design matrix and additional beta parameters). Page 27

28 Multiple Regression Data for Multiple Regression Y i is the response variable (as usual) X i,1,x i,2,...,x i,p 1 are thep 1explanatory variables for casesi = 1 ton. Example In Homework #1 you considered modeling GPA as a function of entrance exam score. But we could also consider intelligence test scores and high Page 28

29 school GPA as potential predictors. This would be 3 variables, sop = 4. Potential problem to remember!!! These predictor variables are likely to be themselves correlated. We always want to be careful of using variables that are themselves strongly correlated as predictors together in the same model. Page 29

30 The Multiple Regression Model Y i = β 0 +β 1 X i,1 +β 2 X i, β p 1 X i,p 1 +ǫ i fori = 1,2,...,n where Y i is the value of the response variable for theith case. ǫ i iid N(0,σ 2 ) (exactly as before!) β 0 is the intercept (think multidimensionally). β 1,β 2,...,β p 1 are the regression coefficients for the explanatory variables. X i,k is the value of thekth explanatory variable for theith case. Parameters as usual include all of theβ s as well asσ 2. These need to be estimated from the data. Page 30

31 Interesting Special Cases Polynomial model: Y i = β 0 +β 1 X i +β 2 X 2 i +...+β p 1 X p 1 i +ǫ i X s can be indicator or dummy variables withx = 0 or1(or any other two distinct numbers) as possible values (e.g. ANOVA model). Interactions between explanatory variables are then expressed as a product of thex s: Y i = β 0 +β 1 X i,1 +β 2 X i,2 +β 3 X i,1 X i,2 +ǫ i Page 31

32 Model in Matrix Form Y n 1 = X n p β p 1 +ǫ n 1 ǫ N(0,σ 2 I n n ) Y N(Xβ,σ 2 I) Page 32

33 Design MatrixX: X = 1 X 1,1 X 1,2 X 1,p 1 1 X 2,1 X 2,2 X 2,p X n,1 X n,2 X n,p 1 Page 33

34 Coefficient matrixβ: β = β 0 β 1. β p 1 Page 34

35 Parameter Estimation Least Squares Findbto minimizesse = (Y Xb) (Y Xb) Obtain normal equations as before: X Xb = X Y Least Squares Solution b = (X X) 1 X Y Fitted (predicted) values for the mean ofy are Ŷ = Xb = X(X X) 1 X Y = HY, whereh = X(X X) 1 X. Page 35

36 Residuals e = Y Ŷ = Y HY = (I H)Y Notice that the matriceshand(i H) have two special properties. They are Symmetric: H = H and(i H) = (I H). Idempotent: H 2 = H and (I H)(I H) = (I H) Page 36

37 Covariance Matrix of Residuals Cov(e) = σ 2 (I H)(I H) = σ 2 (I H) Var(e i ) = σ 2 (1 h i,i ), whereh i,i is theith diagonal element ofh. Note: h i,i = X i(x X) 1 X i where X i = [1X i,1 X i,p 1 ]. Residualse i are usually somewhat correlated: cov(e i,e j ) = σ 2 h i,j ; this is not unexpected, since they sum to0. Page 37

38 Estimation ofσ Since we have estimatedpparameters,sse = e e hasdf E = n p. The estimate forσ 2 is the usual estimate: s 2 = e e n p = (Y Xb) (Y Xb) n p s = s 2 =Root MSE = SSE df E = MSE Page 38

39 Distribution ofb We know thatb = (X X) 1 X Y. The only RV involved isy,so the distribution ofbis based on the distribution ofy. SinceY N(Xβ,σ 2 I), and using the multivariate theorem from earlier (if you like, go through the details on your own), we have E(b) = ( (X X) 1 X ) Xβ = β σ 2 {b} = Cov(b) = σ 2 (X X) 1 Sinceσ 2 is estimated by the MSEs 2,σ 2 {b} is estimated by s 2 (X X) 1. Page 39

40 ANOVA Table Sources of variation are Model (SAS) or Regression (KNNL) Error (Residual) Total SS anddf add as before SSM +SSE = SST df M +df E = df Total but their values are different from SLR. Page 40

41 Sum of Squares SSM = (Ŷi Ȳ)2 SSE = (Y i Ŷi) 2 SSTO = (Y i Ȳ)2 Degrees of Freedom df M = p 1 df E = n p df Total = n 1 The total degrees have not changed from SLR, but the modeldf has increased from 1 to p 1, i.e., the number ofx variables. Correspondingly, the errordf has decreased from n 2ton p. Page 41

42 Mean Squares MSM = SSM df M = MSE = SSE df E = MST = SSTO df Total = ( Ŷ i Ȳ)2 p 1 (Yi Ŷi) 2 n p (Yi Ȳ)2 n 1 Page 42

43 ANOVA Table Source df SS MSE F Model Error Total df M = p 1 SSM MSM MSM MSE df E = n p SSE MSE df T = n 1 SST Page 43

44 F -test H 0 : β 1 = β 2 =... = β p 1 = 0 (all regression coefficients are zero) H A : β k 0, for at least onek = 1,...,p 1; at least one of theβ s is non-zero (or, not all theβ s are zero). F = MSM/MSE Under H 0,F F p 1,n p Reject H 0 iff is larger than critical value; if using SAS, reject H 0 ifp-value< α = Page 44

45 What do we conclude? If H 0 is rejected, we conclude that at least one of the regression coefficients is non-zero; hence at least one of thex variables is useful in predictingy. (Doesn t say which one(s) though). If H 0 is not rejected, then we cannot conclude that any of thex variables is useful in predictingy. Page 45

46 p-value off -test Thep-value for thef significance test tell us one of the following: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (p 0.05). one or more of the explanatory variables in our model is potentially useful for predicting the response in a linear model (p 0.05). Page 46

47 R 2 It gives the proportion of variation in the response variable explained by the explanatory variables. It is sometimes called the coefficient of multiple determination (KNNL, page 236). R 2 = SSM/SST =(the proportion of variation explained by the model) R 2 = 1 (SSE/SST) = 1 (the proportion not explained by the model); F andr 2 are related: F = R 2 /(p 1) (1 R 2 )/(n p) Page 47

48 Inference for Individual Regression Coefficients Confidence Interval forβ k We know thatb N(β,σ 2 (X X) 1 ) Define s 2 {b} p p = MSE (X X) 1 s 2 {b k } = [ s 2 {b} ] k,k, thekth diagonal element CI forβ k : b k ±t c s{b k }, wheret c = t n p (0.975). Page 48

49 Significance Test forβ k H 0 : β k = 0 Same test statistict = b k /s{b k } Still usedf E which now is equal ton p p-value computed fromt n p distribution. This tests the significance of a variable given that the other variables are already in the model (i.e., fitted last). Unlike in SLR, thet-tests forβ are different from thef -test. Page 49

50 Multiple Regression Case Study Example: Study of CS Students Problem: Computer science majors at Purdue have a large drop-out rate. Potential Solution: Can we find predictors of success? Predictors must be available at time of entry into program. Page 50

51 Data Available Grade point average (GPA) after three semesters (Y i, the response variable) Five potential predictors (p = 6) X 1 = High school math grades (HSM) X 2 = High school science grades (HSS) X 3 = High school English grades (HSE) X 4 = SAT Math (SATM) X 5 = SAT Verbal (SATV) Gender (1 = male,2 = female) (we will ignore this one right now, since it is not a continuous variable). We haven = 224 observations, so if all five variables are included, the design matrixxis The SAS program used to generate output for this is cs.sas. Page 51

52 Look at the individual variables Our first goal should be to take a look at the variables to see... Is there anything that sticks out as unusual for any of the variables? How are these variables related to each other (pairwise)? If two predictor variables are strongly correlated, we wouldn t want to use them in the same model! We do this by looking at statistics and plots. data cs; infile H:\System\Desktop\csdata.dat ; input id gpa hsm hss hse satm satv genderm1; Page 52

53 Descriptive Statistics:proc means proc means data=cs maxdec=2; var gpa hsm hss hse satm satv; The optionmaxdec = 2 sets the number of decimal places in the output to 2 (just showing you how). Page 53

54 Output fromproc means. The MEANS Procedure Variable N Mean Std Dev Minimum Maximum gpa hsm hss hse satm satv Page 54

55 Descriptive Statistics Note thatproc univariate also provides lots of other information, not shown. proc univariate data=cs noprint; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal; Page 55

56 GPA and High School Math Graphs Page 56

57 High School Science and High School English Graphs Page 57

58 SAT Math and SAT Verbal Page 58

59 NOTE: If you want the plots (e.g., histogram, qqplot) and not the copious output fromproc univariate, use anoprint statement proc univariate data = cs noprint; histogram gpa / normal; Page 59

60 Interactive Data Analysis Read in the dataset as usual From the menu bar, select Solutions -> analysis -> interactive data analysis Obtain SAS/Insight window Open librarywork Click on Data SetCS and click open. Page 60

61 Getting a Scatter Plot Matrix CTRL+Click ongpa,satm,satv Go to menuanalyze Choose optionscatterplot(y X) You can, while in this window, useedit -> Copy to copy this plot to another program such as Word: Page 61

62 ÿþý ý ý This graph once you get used to it can be useful in getting an overall feel for the relationships among the variables. Try other variables and some other options from theanalyze menu to see what happens. Page 62

63 Correlations SAS will give us ther (correlation) value between pairs of random variables in a data set usingproc corr. proc corr data=cs; var hsm hss hse;. hsm hss hse hsm <.0001 <.0001 hss <.0001 <.0001 hse <.0001 <.0001 Page 63

64 To get rid of thosep-values, which make it difficult to read, use anoprob statement. Here are also a few different ways to callproc corr : proc corr data=cs noprob; var satm satv;. satm satv satm satv Page 64

65 proc corr data=cs noprob; var hsm hss hse; with satm satv;. hsm hss hse satm satv proc corr data=cs noprob; var hsm hss hse satm satv; with gpa;. hsm hss hse satm satv gpa Page 65

66 Notice that not only do thex s correlate withy (this is good), but thex s correlate with each other (this is bad). That means that some of thex s may be redundant in predictingy. Use High School Grades to Predict GPA proc reg data=cs; model gpa=hsm hss hse; Page 66

67 . Analysis of Variance Sum of Mean Source DF Squares Square Model Error Corrected Total Source F Value Pr > F Model <.0001 Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Page 67

68 . Parameter Estimates Parameter Standard Variable DF Estimate Error Intercept hsm hss hse Variable t Value Pr > t Intercept hsm 4.75 <.0001 hss hse Page 68

69 Remove HSS proc reg data=cs; model gpa=hsm hse; R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error Intercept hsm hse Variable t Value Pr > t Intercept hsm 5.72 <.0001 hse Page 69

70 Rerun with HSM only proc reg data=cs; model gpa=hsm; R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error Intercept hsm Variable t Value Pr > t Intercept hsm 7.23 <.0001 Page 70

71 The last two models (HSM only and HSM, HSE) appear to be pretty good. Notice thatr 2 go down a little with the deletion of HSE, so HSE does provide a little information. This is a judgment call: do you prefer a slightly better-fitting model, or a simpler one? Page 71

72 Now look at SAT scores proc reg data=cs; model gpa=satm satv; R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error Intercept satm satv Variable t Value Pr > t Intercept satm satv Page 72

73 Here we see that SATM is a significant explanatory variable, but the overall fit of the model is poor. SATV does not appear to be useful at all. Now try our two most promising candidates: HSM and SATM proc reg data=cs; model gpa=hsm satm; Page 73

74 R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept hsm <.0001 satm In the presence of HSM, SATM does not appear to be at all useful. Note that adjustedr 2 is the same as with HSM only, and that thep-value for SATM is now no longer significant. Page 74

75 General Linear Test Approach: HS and SAT s proc reg data=cs; model gpa=satm satv hsm hss hse; *Do general linear test; * test H0: beta1 = beta2 = 0; sat: test satm, satv; * test H0: beta3=beta4=beta5=0; hs: test hsm, hss, hse; Page 75

76 R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept satm satv hsm hss hse The firsttest statement tests the full vs reduced models (H 0 : no satm or satv, H a : full model) Page 76

77 . Test sat Results for Dependent Variable gpa Mean Source DF Square F Value Pr > F Numerator Denominator We do not reject the H 0 thatβ 1 = β 2 = 0. Probably okay to throw out SAT scores. Page 77

78 The secondtest statement tests the full vs reduced models (H 0 : no hsm, hss, or hse, H a : all three in the model) Test hs Results for Dependent Variable gpa Mean Source DF Square F Value Pr > F Numerator <.0001 Denominator We reject the H 0 thatβ 3 = β 4 = β 5 = 0. CANNOT throw out all high school grades. Page 78

79 Can use thetest statement to test any set of coefficients equal to zero. Related to extra sums of squares (later). Best Model? Likely the one with just HSM. Could argue that HSM and HSE is marginally better. We ll discuss comparison methods in Chapters 7 and 8. Page 79

80 Key ideas from case study First, look at graphical and numerical summaries for one variable at a time. Then, look at relationships between pairs of variables with graphical and numerical summaries. Use plots and correlations to understand relationships. The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model. A variable can be a significant (p < 0.05) predictor alone and not significant (p > 0.05) when otherx s are in the model. Page 80

81 Regression coefficients, standard errors, and the results of significance tests depend on what other explanatory variables are in the model. Significance tests (p-values) do not tell the whole story. Squared multiple correlations (R 2 ) give the proportion of variation in the response variable explained by the explanatory variables, and can give a different view; see also adjustedr 2. You can fully understand the theory in terms ofy = Xβ +ǫ. To effectively use this methodology in practice, you need to understand how the data were collected, the nature of the variables, and how they relate to each other. Page 81

82 Other things that should be considered Diagnostics: Do the models meet the assumptions? You might take one of our final two potential models and check these out on your own. Confidence intervals/bands, Prediction intervals/bands? Page 82

83 Example II (KNNL page 236) Dwaine Studios, Inc. operates portrait studios inn = 21 cities of medium size. Program used to generate output for confidence intervals for means and prediction intervals isnknw241.sas. Y i is sales in cityi X 1 : population aged 16 and under X 2 : per capita disposable income Y i = β 0 +β 1 X i,1 +β 2 X i,2 +ǫ i Page 83

84 Read in the data data a1; infile H:\System\Desktop\CH06FI05.DAT ; input young income sales; proc print data=a1; Obs young income sales Page 84

85 proc reg data=a1; model sales=young income Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept young <.0001 income Page 85

86 clb option: Confidence Intervals for theβ s proc reg data=a1; model sales=young income/clb; 95% Confidence Limits Intercept young income Page 86

87 Estimation of E(Y h ) X h is now a vector of values. (Y h is still just a number.) (1,X h,1,x h,2,...,x h,p 1 ) = X h : this is rowhof the design matrix. We want a point estimate and a confidence interval for the subpopulation mean corresponding to the set of explanatory variablesx h. Page 87

88 Theory for E(Y h ) E(Y h ) = µ j = X hβ ˆµ h = X hb s 2 {ˆµ h } = X h s2 {b}x h = s 2 X h (X X) 1 X h 95%CI : ˆµ h ±s{ˆµ h }t n p (0.975) Page 88

89 clm option: Confidence Intervals for the Mean proc reg data=a1; model sales=young income/clm; id young income;. Dep Var Predicted Std Error Obs young income sales Value Mean Predict Obs % CL Mean Page 89

90 Prediction ofy h(new) Predict a new observationy h withx values equal to X h. We want a prediction ofy h based on a set of predictor values with an interval that expresses the uncertainty in our prediction. As in SLR, this interval is centered atŷh and is wider than the interval for the mean. Page 90

91 Theory fory h Y h = X hβ +ǫ Ŷ h = ˆµ h = X hb σ 2 {pred} = Var(Ŷh +ǫ) = Var(Ŷh)+Var(ǫ) s 2 {pred} = s 2( 1+X h(x X) 1 X h ) CI fory h(new) : Ŷh ±s{pred}t n p (0.975) Page 91

92 cli option: Confidence Interval for an Individual Observation proc reg data=a1; model sales=young income/cli; id young income;. Dep Var Predicted Std Error Obs young income sales Value Mean Predict % CL Predict Page 92

93 Diagnostics Look at the distribution of each variable to gain insight. Look at the relationship between pairs of variables.proc corr is useful here. BUT note that relationships between variables can be more complicated than just pairwise: correlations are NOT the entire story. Plot the residuals vs... the predicted/fitted values each explanatory variable time (if available) Page 93

94 Are the relationships linear? Look aty vs eachx i May have to transform somex s. Are the residuals approximately normal? Look at a histogram Normal quantile plot Is the variance constant? Check all the residual plots. Page 94

95 Remedies Similar remedies to simple regression (but more complicated to decide, though). Additionally may eliminate some of thex s (this is call variable selection). Transformations such as Box-Cox. Analyze with/without outliers. More detail in KNNL Ch 10 and 11. Page 95

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference