Overview of Topic III

Size: px
Start display at page:

Download "Overview of Topic III"

Transcription

1 Overview of This topic will cover thinking in terms of matrices (Chapter 5) regression on multiple predictor variables (Chapter 6) case study: CS majors Text Example (KNNL 236) Page 1

2 Chapter 5: Linear Regression in Matrix Form The SLR Model in Scalar Form Y i = β 0 +β 1 X i +ǫ i where ǫ i iid N(0,σ 2 ) Consider now writing an equation for each observation: Y 1 = β 0 +β 1 X 1 +ǫ 1 Y 2 = β 0 +β 1 X 2 +ǫ 2... Y n = β 0 +β 1 X n +ǫ n Page 2

3 The SLR Model in Matrix Form Y 1 Y 2. = β 0 +β 1 X 1 β 0 +β 1 X 2. + ǫ 1 ǫ 2. Y n β 0 +β 1 X n ǫ n Y 1 Y 2. = 1 X 1 1 X 2.. [ β0 β 1 ] + ǫ 1 ǫ 2. Y n 1 X n ǫ n Page 3

4 (I will try to use bold symbols for matrices. At first, I will also indicate the dimensions as a subscript to the symbol.) X is called the design matrix. β is the vector of parameters ǫ is the error vector Y is the response vector Page 4

5 The Design Matrix X n 2 = 1 X 1 1 X X n Vector of Parameters [ β0 ] β 2 1 = β 1 Page 5

6 Vector of Error Terms ǫ n 1 = ǫ 1 ǫ 2. ǫ n Vector of Responses Y n 1 = Y 1 Y 2. Y n Page 6

7 Thus, Y = Xβ +ǫ Y n 1 = X n 2 β 2 1 +ǫ n 1 Page 7

8 Variance-Covariance Matrix In general, for any set of variablesu 1,U 2,...,U n, their variance-covariance matrix is defined to be σ 2 {U 1 } σ{u 1,U 2 } σ{u 1,U n } σ 2 σ{u 2,U 1 } σ 2. {U 2 }..... {U} = σ{u n 1,U n } σ{u n,u 1 } σ{u n,u n 1 } σ 2 {U n } whereσ 2 {U i } is the variance ofu i, andσ{u i,u j } is the covariance ofu i andu j. Page 8

9 When variables are uncorrelated, that means their covariance is 0. The variance-covariance matrix of uncorrelated variables will be a diagonal matrix, since all the covariances are 0. Note: Variables that are independent will also be uncorrelated. So when variables are correlated, they are automatically dependent. However, it is possible to have variances that are dependent but uncorrelated, since correlation only measures linear dependence. A nice thing about normally distributed RV s is that they are a convenient special case: if they are uncorrelated, they are also independent. Page 9

10 Covariance Matrix ofǫ σ 2 {ǫ} n n = Cov ǫ 1 ǫ 2. ǫ n = σ 2 I n n = σ σ σ 2 Page 10

11 Covariance Matrix ofy σ 2 {Y} n n = Cov Y 1 Y 2. = σ 2 I n n Y n Page 11

12 Distributional Assumptions in Matrix Form ǫ N(0,σ 2 I) I is ann nidentity matrix. Ones in the diagonal elements specify that the variance of eachǫ i is1timesσ 2. Zeros in the off-diagonal elements specify that the covariance between differentǫ i is zero. This implies that the correlations are zero. Page 12

13 Parameter Estimation Least Squares Residuals aree = Y Xb. Want to minimize sum of squared residuals. e 2 i = [e 1 e 2 e n ] e 1 e 2. e n = e e Page 13

14 We want to minimizee e = (Y Xb) (Y Xb), where the prime () denotes the transpose of the matrix (exchange the rows and columns). We take the derivative with respect to the vectorb. This is like a quadratic function: think (Y Xb) 2. The derivative works out to 2 times the derivative of(y Xb) with respect tob. d That is, db ((Y Xb) (Y Xb)) = 2X (Y Xb). We set this equal to0(a vector of zeros), and solve forb. So, 2X (Y Xb) = 0. Or,X Y = X Xb (the normal equations). Page 14

15 Normal Equations X Y = (X X)b Solving this equation forbgives the least squares solution for b = [ b0 b 1 ]. Multiply on the left by the inverse of the matrixx X. (Notice that the matrixx X is a2 2square matrix for SLR.) Page 15

16 b = (X X) 1 X Y REMEMBER THIS. Page 16

17 Reality Break: This is just to convince you that we have done nothing new nor magic all we are doing is writing the same old formulas forb 0 andb 1 in matrix format. Do NOT worry if you cannot reproduce the following algebra, but you SHOULD try to follow it so that you believe me that this is really not a new formula. Page 17

18 Recall in Topic 1, we had b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 b 0 = Ȳ b 1 X SS XY SS X Page 18

19 Now let s look at the pieces of the new formula: X X = X 1 X 2 X n 1 X 1 1 X 2.. = Xi n Xi X 2 i (X X) 1 = X Y = 1 n X 2 i ( X i ) X 1 X 2 X n X 2 i X i 1 X n Y 1 Y 2. X i n = = 1 nss X Yi Xi Y i X 2 i X i Y n Page 19

20 Plug these into the equation forb: b = (X X) 1 X Y = 1 X 2 i X i Yi nss X X i n Xi Y i 1 = ( Xi 2)( Y i ) ( X i )( X i Y i ) nss X ( X i )( Y i )+n X i Y i 1 = Ȳ( Xi 2) X X i Y i SS X Xi Y i n XȲ 1 = Ȳ( Xi 2) Ȳ(n X 2 )+ X(n XȲ) X X i Y i SS X SP XY 1 = ȲSS X SP XY X = Ȳ SP XY SS X X = b 0 SS X SP XY b 1 SP XY SS X, Page 20

21 where SS X = X 2 i n X 2 = (X i X) 2 SP XY = X i Y i n XȲ = (X i X)(Y i Ȳ) All we have done is to write the same old formulas forb 0 andb 1 in a fancy new format. See KNNL page 199 for details. Why have we bothered to do this? The cool part is that the same approach works for multiple regression. All we do is makexandbinto bigger matrices, and use exactly the same formula. Page 21

22 Other Quantities in Matrix Form Fitted Values Ŷ = Ŷ 1 Ŷ 2. = b 0 +b 1 X 1 b 0 +b 1 X 2. = 1 X 1 1 X 2.. [ b0 b 1 ] = Xb Ŷ n b 0 +b 1 X n 1 X n Page 22

23 Hat Matrix Ŷ = Xb Ŷ = X(X X) 1 X Y Ŷ = HY whereh = X(X X) 1 X. We call this the hat matrix because is turnsy s into Ŷ s. Page 23

24 Estimated Covariance Matrix ofb This matrixbis a linear combination of the elements of Y. These estimates are normal ify is normal. These estimates will be approximately normal in general. Page 24

25 A Useful Multivariate Theorem SupposeU N(µ,Σ), a multivariate normal vector, andv = c+du, a linear transformation ofuwherecis a vector anddis a matrix. Then V N(c+Dµ,DΣD ). Recall: b = (X X) 1 X Y = [ (X X) 1 X ] Y and Y N(Xβ,σ 2 I). Page 25

26 Now apply theorem tobusing U = Y,µ = Xβ,Σ = σ 2 I V = b,c = 0, andd = (X X) 1 X The theorem saysbis normally distributed with mean and covariance matrix (X X) 1 (X X)β = β σ 2( (X X) 1 X ) I ( (X X) 1 X ) = σ 2 (X X) 1 (X X) ( (X X) 1) = σ 2 (X X) 1 using the fact that bothx X and its inverse are symmetric, so ((X X) 1 ) = (X X) 1. Page 26

27 Next we will use this framework to do multiple regression where we have more than one explanatory variable (i.e., add another column to the design matrix and additional beta parameters). Page 27

28 Multiple Regression Data for Multiple Regression Y i is the response variable (as usual) X i,1,x i,2,...,x i,p 1 are thep 1explanatory variables for casesi = 1 ton. Example In Homework #1 you considered modeling GPA as a function of entrance exam score. But we could also consider intelligence test scores and high Page 28

29 school GPA as potential predictors. This would be 3 variables, sop = 4. Potential problem to remember!!! These predictor variables are likely to be themselves correlated. We always want to be careful of using variables that are themselves strongly correlated as predictors together in the same model. Page 29

30 The Multiple Regression Model Y i = β 0 +β 1 X i,1 +β 2 X i, β p 1 X i,p 1 +ǫ i fori = 1,2,...,n where Y i is the value of the response variable for theith case. ǫ i iid N(0,σ 2 ) (exactly as before!) β 0 is the intercept (think multidimensionally). β 1,β 2,...,β p 1 are the regression coefficients for the explanatory variables. X i,k is the value of thekth explanatory variable for theith case. Parameters as usual include all of theβ s as well asσ 2. These need to be estimated from the data. Page 30

31 Interesting Special Cases Polynomial model: Y i = β 0 +β 1 X i +β 2 X 2 i +...+β p 1 X p 1 i +ǫ i X s can be indicator or dummy variables withx = 0 or1(or any other two distinct numbers) as possible values (e.g. ANOVA model). Interactions between explanatory variables are then expressed as a product of thex s: Y i = β 0 +β 1 X i,1 +β 2 X i,2 +β 3 X i,1 X i,2 +ǫ i Page 31

32 Model in Matrix Form Y n 1 = X n p β p 1 +ǫ n 1 ǫ N(0,σ 2 I n n ) Y N(Xβ,σ 2 I) Page 32

33 Design MatrixX: X = 1 X 1,1 X 1,2 X 1,p 1 1 X 2,1 X 2,2 X 2,p X n,1 X n,2 X n,p 1 Page 33

34 Coefficient matrixβ: β = β 0 β 1. β p 1 Page 34

35 Parameter Estimation Least Squares Findbto minimizesse = (Y Xb) (Y Xb) Obtain normal equations as before: X Xb = X Y Least Squares Solution b = (X X) 1 X Y Fitted (predicted) values for the mean ofy are Ŷ = Xb = X(X X) 1 X Y = HY, whereh = X(X X) 1 X. Page 35

36 Residuals e = Y Ŷ = Y HY = (I H)Y Notice that the matriceshand(i H) have two special properties. They are Symmetric: H = H and(i H) = (I H). Idempotent: H 2 = H and (I H)(I H) = (I H) Page 36

37 Covariance Matrix of Residuals Cov(e) = σ 2 (I H)(I H) = σ 2 (I H) Var(e i ) = σ 2 (1 h i,i ), whereh i,i is theith diagonal element ofh. Note: h i,i = X i(x X) 1 X i where X i = [1X i,1 X i,p 1 ]. Residualse i are usually somewhat correlated: cov(e i,e j ) = σ 2 h i,j ; this is not unexpected, since they sum to0. Page 37

38 Estimation ofσ Since we have estimatedpparameters,sse = e e hasdf E = n p. The estimate forσ 2 is the usual estimate: s 2 = e e n p = (Y Xb) (Y Xb) n p s = s 2 =Root MSE = SSE df E = MSE Page 38

39 Distribution ofb We know thatb = (X X) 1 X Y. The only RV involved isy,so the distribution ofbis based on the distribution ofy. SinceY N(Xβ,σ 2 I), and using the multivariate theorem from earlier (if you like, go through the details on your own), we have E(b) = ( (X X) 1 X ) Xβ = β σ 2 {b} = Cov(b) = σ 2 (X X) 1 Sinceσ 2 is estimated by the MSEs 2,σ 2 {b} is estimated by s 2 (X X) 1. Page 39

40 ANOVA Table Sources of variation are Model (SAS) or Regression (KNNL) Error (Residual) Total SS anddf add as before SSM +SSE = SST df M +df E = df Total but their values are different from SLR. Page 40

41 Sum of Squares SSM = (Ŷi Ȳ)2 SSE = (Y i Ŷi) 2 SSTO = (Y i Ȳ)2 Degrees of Freedom df M = p 1 df E = n p df Total = n 1 The total degrees have not changed from SLR, but the modeldf has increased from 1 to p 1, i.e., the number ofx variables. Correspondingly, the errordf has decreased from n 2ton p. Page 41

42 Mean Squares MSM = SSM df M = MSE = SSE df E = MST = SSTO df Total = ( Ŷ i Ȳ)2 p 1 (Yi Ŷi) 2 n p (Yi Ȳ)2 n 1 Page 42

43 ANOVA Table Source df SS MSE F Model Error Total df M = p 1 SSM MSM MSM MSE df E = n p SSE MSE df T = n 1 SST Page 43

44 F -test H 0 : β 1 = β 2 =... = β p 1 = 0 (all regression coefficients are zero) H A : β k 0, for at least onek = 1,...,p 1; at least one of theβ s is non-zero (or, not all theβ s are zero). F = MSM/MSE Under H 0,F F p 1,n p Reject H 0 iff is larger than critical value; if using SAS, reject H 0 ifp-value< α = Page 44

45 What do we conclude? If H 0 is rejected, we conclude that at least one of the regression coefficients is non-zero; hence at least one of thex variables is useful in predictingy. (Doesn t say which one(s) though). If H 0 is not rejected, then we cannot conclude that any of thex variables is useful in predictingy. Page 45

46 p-value off -test Thep-value for thef significance test tell us one of the following: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (p 0.05). one or more of the explanatory variables in our model is potentially useful for predicting the response in a linear model (p 0.05). Page 46

47 R 2 It gives the proportion of variation in the response variable explained by the explanatory variables. It is sometimes called the coefficient of multiple determination (KNNL, page 236). R 2 = SSM/SST =(the proportion of variation explained by the model) R 2 = 1 (SSE/SST) = 1 (the proportion not explained by the model); F andr 2 are related: F = R 2 /(p 1) (1 R 2 )/(n p) Page 47

48 Inference for Individual Regression Coefficients Confidence Interval forβ k We know thatb N(β,σ 2 (X X) 1 ) Define s 2 {b} p p = MSE (X X) 1 s 2 {b k } = [ s 2 {b} ] k,k, thekth diagonal element CI forβ k : b k ±t c s{b k }, wheret c = t n p (0.975). Page 48

49 Significance Test forβ k H 0 : β k = 0 Same test statistict = b k /s{b k } Still usedf E which now is equal ton p p-value computed fromt n p distribution. This tests the significance of a variable given that the other variables are already in the model (i.e., fitted last). Unlike in SLR, thet-tests forβ are different from thef -test. Page 49

50 Multiple Regression Case Study Example: Study of CS Students Problem: Computer science majors at Purdue have a large drop-out rate. Potential Solution: Can we find predictors of success? Predictors must be available at time of entry into program. Page 50

51 Data Available Grade point average (GPA) after three semesters (Y i, the response variable) Five potential predictors (p = 6) X 1 = High school math grades (HSM) X 2 = High school science grades (HSS) X 3 = High school English grades (HSE) X 4 = SAT Math (SATM) X 5 = SAT Verbal (SATV) Gender (1 = male,2 = female) (we will ignore this one right now, since it is not a continuous variable). We haven = 224 observations, so if all five variables are included, the design matrixxis The SAS program used to generate output for this is cs.sas. Page 51

52 Look at the individual variables Our first goal should be to take a look at the variables to see... Is there anything that sticks out as unusual for any of the variables? How are these variables related to each other (pairwise)? If two predictor variables are strongly correlated, we wouldn t want to use them in the same model! We do this by looking at statistics and plots. data cs; infile H:\System\Desktop\csdata.dat ; input id gpa hsm hss hse satm satv genderm1; Page 52

53 Descriptive Statistics:proc means proc means data=cs maxdec=2; var gpa hsm hss hse satm satv; The optionmaxdec = 2 sets the number of decimal places in the output to 2 (just showing you how). Page 53

54 Output fromproc means. The MEANS Procedure Variable N Mean Std Dev Minimum Maximum gpa hsm hss hse satm satv Page 54

55 Descriptive Statistics Note thatproc univariate also provides lots of other information, not shown. proc univariate data=cs noprint; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal; Page 55

56 GPA and High School Math Graphs Page 56

57 High School Science and High School English Graphs Page 57

58 SAT Math and SAT Verbal Page 58

59 NOTE: If you want the plots (e.g., histogram, qqplot) and not the copious output fromproc univariate, use anoprint statement proc univariate data = cs noprint; histogram gpa / normal; Page 59

60 Interactive Data Analysis Read in the dataset as usual From the menu bar, select Solutions -> analysis -> interactive data analysis Obtain SAS/Insight window Open librarywork Click on Data SetCS and click open. Page 60

61 Getting a Scatter Plot Matrix CTRL+Click ongpa,satm,satv Go to menuanalyze Choose optionscatterplot(y X) You can, while in this window, useedit -> Copy to copy this plot to another program such as Word: Page 61

62 ÿþý ý ý This graph once you get used to it can be useful in getting an overall feel for the relationships among the variables. Try other variables and some other options from theanalyze menu to see what happens. Page 62

63 Correlations SAS will give us ther (correlation) value between pairs of random variables in a data set usingproc corr. proc corr data=cs; var hsm hss hse;. hsm hss hse hsm <.0001 <.0001 hss <.0001 <.0001 hse <.0001 <.0001 Page 63

64 To get rid of thosep-values, which make it difficult to read, use anoprob statement. Here are also a few different ways to callproc corr : proc corr data=cs noprob; var satm satv;. satm satv satm satv Page 64

65 proc corr data=cs noprob; var hsm hss hse; with satm satv;. hsm hss hse satm satv proc corr data=cs noprob; var hsm hss hse satm satv; with gpa;. hsm hss hse satm satv gpa Page 65

66 Notice that not only do thex s correlate withy (this is good), but thex s correlate with each other (this is bad). That means that some of thex s may be redundant in predictingy. Use High School Grades to Predict GPA proc reg data=cs; model gpa=hsm hss hse; Page 66

67 . Analysis of Variance Sum of Mean Source DF Squares Square Model Error Corrected Total Source F Value Pr > F Model <.0001 Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Page 67

68 . Parameter Estimates Parameter Standard Variable DF Estimate Error Intercept hsm hss hse Variable t Value Pr > t Intercept hsm 4.75 <.0001 hss hse Page 68

69 Remove HSS proc reg data=cs; model gpa=hsm hse; R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error Intercept hsm hse Variable t Value Pr > t Intercept hsm 5.72 <.0001 hse Page 69

70 Rerun with HSM only proc reg data=cs; model gpa=hsm; R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error Intercept hsm Variable t Value Pr > t Intercept hsm 7.23 <.0001 Page 70

71 The last two models (HSM only and HSM, HSE) appear to be pretty good. Notice thatr 2 go down a little with the deletion of HSE, so HSE does provide a little information. This is a judgment call: do you prefer a slightly better-fitting model, or a simpler one? Page 71

72 Now look at SAT scores proc reg data=cs; model gpa=satm satv; R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error Intercept satm satv Variable t Value Pr > t Intercept satm satv Page 72

73 Here we see that SATM is a significant explanatory variable, but the overall fit of the model is poor. SATV does not appear to be useful at all. Now try our two most promising candidates: HSM and SATM proc reg data=cs; model gpa=hsm satm; Page 73

74 R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept hsm <.0001 satm In the presence of HSM, SATM does not appear to be at all useful. Note that adjustedr 2 is the same as with HSM only, and that thep-value for SATM is now no longer significant. Page 74

75 General Linear Test Approach: HS and SAT s proc reg data=cs; model gpa=satm satv hsm hss hse; *Do general linear test; * test H0: beta1 = beta2 = 0; sat: test satm, satv; * test H0: beta3=beta4=beta5=0; hs: test hsm, hss, hse; Page 75

76 R-Square Adj R-Sq Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept satm satv hsm hss hse The firsttest statement tests the full vs reduced models (H 0 : no satm or satv, H a : full model) Page 76

77 . Test sat Results for Dependent Variable gpa Mean Source DF Square F Value Pr > F Numerator Denominator We do not reject the H 0 thatβ 1 = β 2 = 0. Probably okay to throw out SAT scores. Page 77

78 The secondtest statement tests the full vs reduced models (H 0 : no hsm, hss, or hse, H a : all three in the model) Test hs Results for Dependent Variable gpa Mean Source DF Square F Value Pr > F Numerator <.0001 Denominator We reject the H 0 thatβ 3 = β 4 = β 5 = 0. CANNOT throw out all high school grades. Page 78

79 Can use thetest statement to test any set of coefficients equal to zero. Related to extra sums of squares (later). Best Model? Likely the one with just HSM. Could argue that HSM and HSE is marginally better. We ll discuss comparison methods in Chapters 7 and 8. Page 79

80 Key ideas from case study First, look at graphical and numerical summaries for one variable at a time. Then, look at relationships between pairs of variables with graphical and numerical summaries. Use plots and correlations to understand relationships. The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model. A variable can be a significant (p < 0.05) predictor alone and not significant (p > 0.05) when otherx s are in the model. Page 80

81 Regression coefficients, standard errors, and the results of significance tests depend on what other explanatory variables are in the model. Significance tests (p-values) do not tell the whole story. Squared multiple correlations (R 2 ) give the proportion of variation in the response variable explained by the explanatory variables, and can give a different view; see also adjustedr 2. You can fully understand the theory in terms ofy = Xβ +ǫ. To effectively use this methodology in practice, you need to understand how the data were collected, the nature of the variables, and how they relate to each other. Page 81

82 Other things that should be considered Diagnostics: Do the models meet the assumptions? You might take one of our final two potential models and check these out on your own. Confidence intervals/bands, Prediction intervals/bands? Page 82

83 Example II (KNNL page 236) Dwaine Studios, Inc. operates portrait studios inn = 21 cities of medium size. Program used to generate output for confidence intervals for means and prediction intervals isnknw241.sas. Y i is sales in cityi X 1 : population aged 16 and under X 2 : per capita disposable income Y i = β 0 +β 1 X i,1 +β 2 X i,2 +ǫ i Page 83

84 Read in the data data a1; infile H:\System\Desktop\CH06FI05.DAT ; input young income sales; proc print data=a1; Obs young income sales Page 84

85 proc reg data=a1; model sales=young income Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept young <.0001 income Page 85

86 clb option: Confidence Intervals for theβ s proc reg data=a1; model sales=young income/clb; 95% Confidence Limits Intercept young income Page 86

87 Estimation of E(Y h ) X h is now a vector of values. (Y h is still just a number.) (1,X h,1,x h,2,...,x h,p 1 ) = X h : this is rowhof the design matrix. We want a point estimate and a confidence interval for the subpopulation mean corresponding to the set of explanatory variablesx h. Page 87

88 Theory for E(Y h ) E(Y h ) = µ j = X hβ ˆµ h = X hb s 2 {ˆµ h } = X h s2 {b}x h = s 2 X h (X X) 1 X h 95%CI : ˆµ h ±s{ˆµ h }t n p (0.975) Page 88

89 clm option: Confidence Intervals for the Mean proc reg data=a1; model sales=young income/clm; id young income;. Dep Var Predicted Std Error Obs young income sales Value Mean Predict Obs % CL Mean Page 89

90 Prediction ofy h(new) Predict a new observationy h withx values equal to X h. We want a prediction ofy h based on a set of predictor values with an interval that expresses the uncertainty in our prediction. As in SLR, this interval is centered atŷh and is wider than the interval for the mean. Page 90

91 Theory fory h Y h = X hβ +ǫ Ŷ h = ˆµ h = X hb σ 2 {pred} = Var(Ŷh +ǫ) = Var(Ŷh)+Var(ǫ) s 2 {pred} = s 2( 1+X h(x X) 1 X h ) CI fory h(new) : Ŷh ±s{pred}t n p (0.975) Page 91

92 cli option: Confidence Interval for an Individual Observation proc reg data=a1; model sales=young income/cli; id young income;. Dep Var Predicted Std Error Obs young income sales Value Mean Predict % CL Predict Page 92

93 Diagnostics Look at the distribution of each variable to gain insight. Look at the relationship between pairs of variables.proc corr is useful here. BUT note that relationships between variables can be more complicated than just pairwise: correlations are NOT the entire story. Plot the residuals vs... the predicted/fitted values each explanatory variable time (if available) Page 93

94 Are the relationships linear? Look aty vs eachx i May have to transform somex s. Are the residuals approximately normal? Look at a histogram Normal quantile plot Is the variance constant? Check all the residual plots. Page 94

95 Remedies Similar remedies to simple regression (but more complicated to decide, though). Additionally may eliminate some of thex s (this is call variable selection). Transformations such as Box-Cox. Analyze with/without outliers. More detail in KNNL Ch 10 and 11. Page 95

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Chapter 6 Multiple Regression

Chapter 6 Multiple Regression STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

Lecture 13 Extra Sums of Squares

Lecture 13 Extra Sums of Squares Lecture 13 Extra Sums of Squares STAT 512 Spring 2011 Background Reading KNNL: 7.1-7.4 13-1 Topic Overview Extra Sums of Squares (Defined) Using and Interpreting R 2 and Partial-R 2 Getting ESS and Partial-R

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

Lecture 12 Inference in MLR

Lecture 12 Inference in MLR Lecture 12 Inference in MLR STAT 512 Spring 2011 Background Reading KNNL: 6.6-6.7 12-1 Topic Overview Review MLR Model Inference about Regression Parameters Estimation of Mean Response Prediction 12-2

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Topic 16: Multicollinearity and Polynomial Regression

Topic 16: Multicollinearity and Polynomial Regression Topic 16: Multicollinearity and Polynomial Regression Outline Multicollinearity Polynomial regression An example (KNNL p256) The P-value for ANOVA F-test is

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Statistics 512: Applied Linear Models. Topic 1

Statistics 512: Applied Linear Models. Topic 1 Topic Overview This topic will cover Course Overview & Policies SAS Statistics 512: Applied Linear Models Topic 1 KNNL Chapter 1 (emphasis on Sections 1.3, 1.6, and 1.7; much should be review) Simple linear

More information

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Course Information Text:

Course Information Text: Course Information Text: Special reprint of Applied Linear Statistical Models, 5th edition by Kutner, Neter, Nachtsheim, and Li, 2012. Recommended: Applied Statistics and the SAS Programming Language,

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Multiple Linear Regression

Multiple Linear Regression Chapter 3 Multiple Linear Regression 3.1 Introduction Multiple linear regression is in some ways a relatively straightforward extension of simple linear regression that allows for more than one independent

More information

Topic 28: Unequal Replication in Two-Way ANOVA

Topic 28: Unequal Replication in Two-Way ANOVA Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ... Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013 Topic 19 - Inference - Fall 2013 Outline Inference for Means Differences in cell means Contrasts Multiplicity Topic 19 2 The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23 2.4. ASSESSING THE MODEL 23 2.4.3 Estimatingσ 2 Note that the sums of squares are functions of the conditional random variables Y i = (Y X = x i ). Hence, the sums of squares are random variables as well.

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Lecture 18 MA Applied Statistics II D 2004

Lecture 18 MA Applied Statistics II D 2004 Lecture 18 MA 2612 - Applied Statistics II D 2004 Today 1. Examples of multiple linear regression 2. The modeling process (PNC 8.4) 3. The graphical exploration of multivariable data (PNC 8.5) 4. Fitting

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

Profile Analysis Multivariate Regression

Profile Analysis Multivariate Regression Lecture 8 October 12, 2005 Analysis Lecture #8-10/12/2005 Slide 1 of 68 Today s Lecture Profile analysis Today s Lecture Schedule : regression review multiple regression is due Thursday, October 27th,

More information

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section # Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Sylvan Herskowitz Section #10 4-1-15 1 Warm-Up: Remember that exam you took before break? We had a question that said this A researcher wants to

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X. One-Way Analysis of Variance (ANOVA) Also called single factor ANOVA. The response variable Y is continuous (same as in regression). There are two key differences regarding the explanatory variable X.

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information