Introduction to linear model

Size: px
Start display at page:

Download "Introduction to linear model"

Transcription

1 Introduction to linear model Valeska Andreozzi 2012 References 3 Correlation 5 Definition Pearson correlation coefficient Spearman correlation coefficient Hypothesis test Simple linear regression 15 Motivation The model Model assumptions Fitting the model Exercise Multiple linear regression 29 The model Hypothesis test Variable selection Model check Interaction

2 Summary References Correlation Definition Pearson correlation coefficient Spearman correlation coefficient Hypothesis test Simple linear regression Motivation The model Model assumptions Fitting the model Exercise Multiple linear regression The model Hypothesis test Variable selection Model check Interaction DEIO/CEAUL Valeska Andreozzi slide 2 References slide 3 References Rosner, B (2010). Fundamentals of Biostatistics. 7 th Edition. Duxbury Resource Center. Krzanowski, W (1998). An Introduction to Statistical Modelling. Arnold Texts in Statistics. Harrel, F (2001). Regression Modeling Strategies. Springer-Verlag. Weisberg, S (2005). Applied Linear Regression (Wiley Series in Probability and Statistics). Third Edition. Wiley. Dalgaard, P (2008). Introductory Statistics with R (Statistics and Computing). Second Edition. Springer. DEIO/CEAUL Valeska Andreozzi slide 4 Correlation slide 5 Definition The correlation coefficient is a dimensionless quantity that is independent of the units of the random variables X and Y and ranges between 1 and 1 For random variables that are approximately linearly related, a correlation coefficient of 0 implies independence A correlation coefficient close to 1 implies nearly perfect positive dependence with large values of X corresponding to large values of Y and small values of X corresponding to small values of Y A correlation coefficient close to 1 implies perfect negative dependence with large values of X corresponding to small values of Y and vice versa DEIO/CEAUL Valeska Andreozzi slide 6 2

3 Examples An example of a strong positive correlation is between forced respiratory volume (FEV) and height. A weaker positive correlation exist between serum cholesterol and dietary intake cholesterol. A strong negative correlation can be found between resting pulse and age in children under the age of 10y. DEIO/CEAUL Valeska Andreozzi slide 7 Pearson correlation coefficient Assuming that Y e X are two random variables with a linear relationship, we can measure the correlation of a sample calculating the Pearson correlation coeficient, given by: i r = (x i x)(y i ȳ) i (x i x) 2 i (y i ȳ) 2 DEIO/CEAUL Valeska Andreozzi slide 8 in R library(iswr) data(thuesen) attach(thuesen) View(thuesen) plot(thuesen) cor(blood.glucose, short.velocity) cor(blood.glucose, short.velocity,use="complete.obs") Correlation in R Commander Statistics > Summaries > Correlation matrix... DEIO/CEAUL Valeska Andreozzi slide 9 3

4 Spearman correlation coefficient Adequate when one or both variables are either ordinal or have a distribution that is far from normal The Spearman correlation coefficient is a nonparametric method which has the advantage of being invariant to monotone transformation of the coordinates. The man disadvantage of this method is that its interpretation is not quite clear. The Spearman (rank) correlation coefficient r s is obtained by replacing the observation of X and Y by their ranks and computing the correlation (Pearson coefficient). It is assumed that if there were a perfect correlation between two variables, then the ranks for each person on each variable would be the same and r s = 1. The less perfect the correlation, the closer to zero r s would be. DEIO/CEAUL Valeska Andreozzi slide 10 in R Correlation in R x<-rank(thuesen$blood.glucose[-16]) y<-rank(thuesen$short.velocity[-16]) cor(x,y,method="pearson") cor(thuesen$blood.glucose,thuesen$short.velocity, use="complete.obs",method="spearman") Correlation in R Commander Statistics > Summaries > Correlation matrix... DEIO/CEAUL Valeska Andreozzi slide 11 Hypothesis test Is is possible to test the significance of the correlation by tranforming it to a t-distributed variable, which will be identical with the test obtained from testing the significance of the slope of either regression of y on x, or vice-versa (see later). In R cor.test(blood.glucose, short.velocity) in R Commander Statistics > Summaries > Correlation test... DEIO/CEAUL Valeska Andreozzi slide 12 4

5 Exercises Match the following items with graphics. r = 0 0 < r < 1 r = 1 r = 1 1 < r < 0 DEIO/CEAUL Valeska Andreozzi slide 13 Exercises DEIO/CEAUL Valeska Andreozzi slide 14 5

6 Simple linear regression slide 15 Motivation What are the relationship between sistolic blood pressure (SBP) and age among health adults? SBP increases with age There are fluctuations around a linear trend Variability of SBP not completely explained by age random component Why would we like to fit a model? Describe the relationship between SBP and age Prediction DEIO/CEAUL Valeska Andreozzi slide 16 Motivation What can we say about SBP age? pa id DEIO/CEAUL Valeska Andreozzi slide 17 6

7 General concepts y i = β 0 + β 1 x i + ǫ i pa Positive linear correlation Relationship is not perfect. Fitted line which describe the linear relationship between SBP and age. ŷ i = x i id DEIO/CEAUL Valeska Andreozzi slide 18 Model interpretation ŜBP i = age i pa ˆβ 0 = estimated value of SBP when age is zero ˆβ 1 = 0.97 the SPB increases 0.97 mmhg for an increment on one year of age id DEIO/CEAUL Valeska Andreozzi slide 19 7

8 Model illustration Illustration of the components of a simple linear regression model. Systematic component: β 0 + β 1 x i Statistical/Probabilistic Model: Y i = β 0 + β 1 x i + ǫ i or E(Y i ) = β 0 + β 1 x i DEIO/CEAUL Valeska Andreozzi slide 20 Model illustration Simple linear regression representation. The means of the probability distributions of Y i show the sistematic relation with X DEIO/CEAUL Valeska Andreozzi slide 21 Model assumptions Independence: Y i are all independent Linearity: The expected value of Y i is a linear function of X i Homogeneity of variance: The variance of Y i probability distribution is constant over X and equal to σ 2 Normality: For all X i, Y i follows a Normal distribution. Assumption necessary to build hypothesis test and confidence intervals of the model parameters β DEIO/CEAUL Valeska Andreozzi slide 22 8

9 Model estimation Least Squares Method E(Y i X) = β 0 + β 1 x i LSM gives estimates β 0 and β 1 that minimize the sum of squared errors (SSE) SSE = = n i=1 ǫ 2 i n (y i ŷ i ) 2 = i=1 i=1 n (y i β 0 β 1 x i ) 2 DEIO/CEAUL Valeska Andreozzi slide 23 Model estimation β coefficients Differentiating SSE and setting the partial derivatives to zero, we have: SSE β 0 = SSE β 1 = n [y i β 0 β 1 x i ] = 0 i=1 n [x i (y i β 0 β 1 x i )] = 0 i=1 The system results give the estimates of the model parameters ˆβ 0 = ȳ ˆβ 1 x n i=1 ˆβ 1 = (x i x)(y i ȳ) n i=1 (x i x) 2 DEIO/CEAUL Valeska Andreozzi slide 24 Model estimation Variance of Y (σ 2 ) Under the null hypothesis that the residuals are independent random variables with zero mean and variance constant equal to σ 2, an unbiased estimator for σ 2 is the ratio between SSE = n i=1 ǫ2 i and the degree of freedom (the number of observation minus the number of model coefficients) And then, the variance σ 2 of Y is obtained. DEIO/CEAUL Valeska Andreozzi slide 25 9

10 in R Simple linear regression in R dados<-read.table("pasis.dat",header=t) names(dados) head(dados) plot(dados) modelo<-lm(pa~id,data=dados) summary(modelo) plot(dados) abline(modelo,col=2) DEIO/CEAUL Valeska Andreozzi slide 26 in R Commander Simple linear regression in R Commander Data > Import data > fromm text file, clipboard, or URL,... Graphics > Scatterplot... Statistics > Fit Models > Linear regression... DEIO/CEAUL Valeska Andreozzi slide 27 Exercise Exercise in R 1. With the rmr dataset (ISwR package), plot the metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, what is the predicted metabolic rate for a body weight of 70kg? 2. In the jull dataset (ISwR package) fit a linear regression model to the square root of the IGF-I concentration versus age, to the group of subjects over 25 years old. Tools > Load package(s)... Data > Data in packages > Read data set from an attached package... Graphics > Scatterplot... Statistics > Fit Models > Linear regression... Data > Manage variable in active data set > Compute new variable... DEIO/CEAUL Valeska Andreozzi slide 28 10

11 Multiple linear regression slide 29 Multiple linear regression y i = β 0 + β 1 x 1i + β 2 x 2i + ǫ i Describe the relationship between the response (dependent) variable (outcome) (Y ) and two or more independent variables (covariates, predictors, explanatory variables) (X 1,X 2,X 3,,X k ) Estimate the direction of the association between response variable and covariates. The covariates can be transformed variables (example: log(cd4)), polynomials terms (example: age 2 ), interaction terms (example age sex) and dummy variables. Determinate which covariates are important to predict the response variable Describe the relationship of X 1,X 2,X 3,,X k and Y adjusted by the effect of other covariates Z 1 and Z 2, for example. DEIO/CEAUL Valeska Andreozzi slide 30 Multiple linear regression y i = β 0 + β 1 x 1i + β 2 x 2i + ǫ i Assumes that the response variable is an random variable which varies from individual to individual i. The nature of the continuous response variable suggests that the Normal distribution is adequate to the population model of Y i So, Y i follows a Normal distribution with mean µ i and variance σ 2 unknown. (Y i N(µ i,σ 2 )) Similarly, we can say that each observation y i = µ i + ǫ i and that ǫ i N(0,σ 2 ) The model parameters are also estimated by least square method. DEIO/CEAUL Valeska Andreozzi slide 31 11

12 Exemplo Describe the relationship between blood preassure (y i ) and age (x 1i ), body mass index (x 2i ) and smoke habits (x 3i ). File: (multi.dat) E(Y i ) = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i Data > Import data > from a text file, clipboard or URL... Statistics > Summaries > Active data set summary(dados) pessoa pa id Min. : 1.00 Min. :120.0 Min. : st Qu.: st Qu.: st Qu.:48.00 Median :16.50 Median :143.0 Median :53.50 Mean :16.50 Mean :144.5 Mean : rd Qu.: rd Qu.: rd Qu.:58.25 Max. :32.00 Max. :180.0 Max. :65.00 imc hf Min. :2368 n~ao:15 1st Qu.:3022 sim:17 Median :3380 Mean :3441 3rd Qu.:3776 Max. :4637 DEIO/CEAUL Valeska Andreozzi slide 32 in R Commander Statitics > Fit models > Linear models... Call: lm(formula = pa ~ id + imc + hf, data = multi) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** id *** imc hf[t.sim] *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 28 DF, p-value: 7.602e-09 DEIO/CEAUL Valeska Andreozzi slide 33 12

13 Model interpretation β 0 = The intercept has no real interpretation in the example because is the average BP of a person with age zero, body mass index equal to zero and who smokes. β 1 = The BP increases, on average, 1.21 mmhg for an increase of 1 year of age adjusted by body mass index and smoke habits β 2 = The BP increases, on average, mmhg for an increase of 1 unit of body mass index, holding everything else constant β 3 = The BP increases, on average, 9.94 mmhg for those who smokes compared to those that does not smoke. This effect is adjusted by all the other variable in the model id effect plot imc effect plot pa id hf effect plot pa imc 150 pa não hf sim DEIO/CEAUL Valeska Andreozzi slide 34 Hypothesis test Analysis of Variance ANOVA partition the total variability in the sample of y i into: (y i y) 2 i }{{} total variability = (ŷ i y) 2 i }{{} variability explained by the regression line + (y i ŷ i) 2 i }{{} variability not explained (residual variation about the fitted line) Variability partitions DEIO/CEAUL Valeska Andreozzi slide 35 13

14 ANOVA (y i y) 2 = (ŷ i y) 2 i i }{{}}{{} Total Regression + i (y i ŷ i ) 2 } {{ } Residual Source Sum of degrees of Mean sum of squares (SS) freedom (df) squares (MS) Regression SSreg = (ŷ i ȳ) 2 m MSregression = SSreg m Residual SSE= (y i ŷ i ) 2 n m 1 MSresidual = SSE n m 1 Total SStotal = (y i ȳ) 2 n 1 MStotal = SStotal n 1 DEIO/CEAUL Valeska Andreozzi slide 36 Hypothesis test ANOVA F = SSreg m SSE n m 1 with n = number of observations and m = number of variables. = MSregression MSresidual F m,n m 1 If there is no linear relationship, the regression sum of square just represent random variation so the regression mean square is another, independent, estimate of σ 2 The F test indicates whether there is evidence of a linear relationship between Y and X F test: Ratio between variability explained by the regression and residual variation This ratio will close to one if there is no an effective relationship and will be larger, otherwise. In the simple linear regression this is equivalent to test H 0 : β 1 = 0 versus H 1 : β 1 0 DEIO/CEAUL Valeska Andreozzi slide 37 14

15 in R Commander ANOVA for multiple regression model Statitics > Fit models > Linear models... Call: lm(formula = pa ~ id + imc + hf, data = multi) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** id *** imc hf[t.sim] *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 28 DF, p-value: 7.602e-09 DEIO/CEAUL Valeska Andreozzi slide 38 Hypothesis test Wald test Test H 0 : β k = 0 versus H 1 : β k 0 using T statistics T = β k EP( β k ) Under H 0, T follows a t-student distribution with n p degrees of freedom (p = number of model coefficients and n = number of observations) or approximately a normal distribution with zero mean and unity variance. DEIO/CEAUL Valeska Andreozzi slide 39 in R Commander Wald test in R Commander Statitics > Fit models > Linear models... Call: lm(formula = pa ~ id + imc + hf, data = multi) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** id *** imc hf[t.sim] *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 28 DF, p-value: 7.602e-09 DEIO/CEAUL Valeska Andreozzi slide 40 15

16 Hypothesis test Partial F-test To compare two models that are nested, we can compute partial F-test. Suppose two models M p and M q with, respectively, p and q parameters (p < q). M p and M q are nested models if (M p M q ), i.e, all parameters present in M p are also present in M q. To test H 0 : the subset of variables present in M q that is not present M p are all not significant against H 1 : at least one of the variable in this subset is significant to model Y correspond to test simultaneously that q p parameters are all equal to zero, using partial F-test. F = (SSreg q SSreg p )/(q p) SSE q /(n q) F q p,n q DEIO/CEAUL Valeska Andreozzi slide 41 in R Commander Partial F-test in R Commander Models > Hypothesis test > Compare two models... > anova(linearmodel.3, LinearModel.1) Analysis of Variance Table Model 1: pa ~ id Model 2: pa ~ id + imc + hf Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Signif. codes: 0 *** ** 0.01 * DEIO/CEAUL Valeska Andreozzi slide 42 Confidence Interval 100(1 α)% confidence interval of β s is given by: [ β k t n p,α/2 EP( β k ) ; βk + t n p,α/2 EP( β k )] In R: confint(modelo) In R Commander: Models > Confidence intervals... > Confint(LinearModel.6, level=.95) Estimate 2.5 % 97.5 % (Intercept) id imc hf[t.sim] DEIO/CEAUL Valeska Andreozzi slide 43 16

17 Coefficient of determination (Multiple) Coefficient of determination is a summary measure given by the ratio of the regression sum of squares to the total sum of squares. R 2 = SSreg SStotal R 2 represents the proportion of the total sum of squares explained by the regression. R 2 can be estabilished by the square of the correlation between the y i and the predicted values ŷ i from the model. R 2 lies between 0 and 1. Important note: Do not confuse F-test from ANOVA with R 2. F-test from ANOVA indicates whether there is evidence of linear relationship between Y and X or in other words, that the regression model is significant. R 2 measure the quality of the model for prediction of Y DEIO/CEAUL Valeska Andreozzi slide 44 Coefficient of determination R 2 should not be used as a measure of quality of model fit because its value always increase when a variable is add to the model. For this purpose one can use an adjusted coefficient of determination R 2 a R 2 a = 1 MSresidual MStotal DEIO/CEAUL Valeska Andreozzi slide 45 Variable selection There are several methods to select variable into a model. The most popular are the sequential ones: foward selection, backward deletion and stepwise selection. Forward selection: models are systematically built up by adding variables one by one to the null model compromising just β 0 (intercept) Backward deletion: models are systematically reduced by deleting variables one by one to the full model compromising just β 0 Stepwise selection: it is a combination of the two process mentioned above. For any methods, the most crucial decision is the choice of the stopping-rule. Some choices are the Akaike Information Criteria, which there is no statistical distribution associated to proceed a formal test, or the partial F-test, which the level of significance to add or delete a variable has to be chosen. Let s learn by an example. DEIO/CEAUL Valeska Andreozzi slide 46 17

18 Example Chose package MASS and data set birtwt Data > Data in packages > Read data set from an attached package... Help > Help on active data set (if available)... Recode the variables: change bwt to kg, transform race and smoke to factors Data > Manage variables in active data set > Convert numerical variables to factors... Data > Manage variables in active data set > Compute new variable Fit the model: bwt age + ftv + ht + lwt + ptl + race + smoke + ui Statistics > Fit models > Linear model... Select variables by using a sequential method Models > Stepwise model selection... DEIO/CEAUL Valeska Andreozzi slide 47 Example Select variables by using a foward selection with partial F-test. In each step, add a variable that has the minimum p-value inferior to 0.20 nullmodel <-lm(bwt ~ 1, data=birthwt) add1(nullmodel,scope=~ age +ftv +ht + lwt + ptl + race + smoke + ui, test="f") model1<-lm(bwt ~ ui, data=birthwt) add1(model1,scope=~ age +ftv +ht + lwt + ptl + race + smoke +ui,test="f") model1<-lm(bwt ~ ui+race, data=birthwt) add1(model1,scope=~ age +ftv +ht + lwt + ptl + race + smoke +ui,test="f") model1<-lm(bwt ~ ui+race+smoke, data=birthwt) add1(model1,scope=~ age +ftv +ht + lwt + ptl + race + smoke +ui,test="f") model1<-lm(bwt ~ ui+race+smoke+ht, data=birthwt) add1(model1,scope=~ age +ftv +ht + lwt + ptl + race + smoke +ui,test="f") model1<-lm(bwt ~ ui+race+smoke+ht+lwt, data=birthwt) add1(model1,scope=~ age +ftv +ht + lwt + ptl + race + smoke +ui,test="f") addmodel<-lm(bwt~ ui+race+smoke+ht+lwt,data=birthwt) summary(addmodel) DEIO/CEAUL Valeska Andreozzi slide 48 18

19 Example Select variables by using a backward deletion with partial F-test. In each step, delete a variable that has the maximum p-value superior to 0.25 fullmodel <-lm(bwt ~ age +ftv +ht + lwt + ptl + race + smoke + ui, data=birthwt) drop1(fullmodel,test="f") model2 <-lm(bwt ~ age +ht + lwt + ptl + race + smoke + ui, data=birthwt) drop1(model2,test="f") model2 <-lm(bwt ~ ht + lwt + ptl + race + smoke + ui, data=birthwt) drop1(model2,test="f") model2 <-lm(bwt ~ ht + lwt + race + smoke + ui, data=birthwt) drop1(model2,test="f") dropmodel<-lm(bwt ~ ht + lwt + race + smoke + ui, data=birthwt) summary(dropmodel) DEIO/CEAUL Valeska Andreozzi slide 49 Model check Regression diagnostics are used after fitting to check if a fitted mean function and assumptions are consistent with observed data. The basic statistics here are the residuals or possibly rescaled residuals. If the fitted model does not give a set of residuals that appear to be reasonable, then some aspect of the model, either the assumed mean function or assumptions concerning the variance function, may be called into doubt. DEIO/CEAUL Valeska Andreozzi slide 50 Residuals Using the matrix notation, we begin by deriving the properties of residuals. The basic multiple linear regression model is given by Y = Xβ + ǫ and V ar(ǫ) = σ 2 I X is a known matrix with n rows and p columns, including a column of 1s for the intercept β is the unknown parameter vector p 1 ǫ consists of unobservable errors that we assume are equally variable and uncorrelated DEIO/CEAUL Valeska Andreozzi slide 51 19

20 Residuals We estimate β by ˆβ = (X T X) 1 X T Y and the fitted values Ŷ Ŷ = X ˆβ (1) = X(X T X) 1 X T Y (2) = HY (3) where H is a n n called hat matrix because it transforms the vector of observed responses Y into the vector of fitted responses Ŷ DEIO/CEAUL Valeska Andreozzi slide 52 Residuals The vector of residuals ˆǫ is defined by ˆǫ = Y Ŷ (4) = Y X ˆβ (5) = Y X(X T X) 1 X T Y (6) = (I H)Y (7) DEIO/CEAUL Valeska Andreozzi slide 53 Residuals The errors ǫ are unobservable random variables, assumed to have zero mean and uncorrelated elements, each with common variance σ 2. The residuals ˆǫ are computed quantities that can be graphed or otherwise studied. Their mean and variance, using equation 7, are: E(ˆǫ) = 0 V ar(ˆǫ) = σ 2 (I H) Like the errors, each of the residuals has zero mean, but each residual may have a different variance. Unlike the errors, the residuals are correlated The residuals are linear combinations of the errors. If the errors are normally distributed, so are the residuals. DEIO/CEAUL Valeska Andreozzi slide 54 Residuals In scalar form, the variance of the i th residual is V ar(ˆǫ i ) = ˆσ 2 (1 h ii ) (8) where h ii is the i th diagonal element of H Diagnostic procedures are based on the computed residuals, which we would like to assume behave as the unobservable errors would. DEIO/CEAUL Valeska Andreozzi slide 55 20

21 Residuals All the above story is told to conclude that model validation should be done by standardized residuals Here are some examples DEIO/CEAUL Valeska Andreozzi slide 56 Residuals Here are some examples DEIO/CEAUL Valeska Andreozzi slide 57 Residuals Here are some examples DEIO/CEAUL Valeska Andreozzi slide 58 21

22 Residuals Residual plots: (a) null plot; (b) right-opening megaphone; (c) left-opening megaphone; (d) double outward box; (e) - (f) nonlinearity; (g) - (h) combinations of nonlinearity and nonconstant variance function. DEIO/CEAUL Valeska Andreozzi slide 59 Residuals Working residual Pearson residual Pearson standardized residual r p = r = y i µ i r p = y i µ i σ 2 y i µ i σ2 (1 h ii ) in R rstandard(model, type="pearson") in R Commander Models > Add observations statistics to data The R commander calculates de Studentized residuals (re-normalize the residuals to have unit variance, using a leave-one-out measure of the error variance) DEIO/CEAUL Valeska Andreozzi slide 60 22

23 Plot of residuals Constant variance: plot standardized residuals against their corresponding fitted values (ŷ i ). The points should appear randomly and evenly about zero if assumption is respected. Graphs > Scatterplot... rstudent.linearmodel fitted.linearmodel.2 DEIO/CEAUL Valeska Andreozzi slide 61 Plot of residuals Normality: plot the ranked standardized residuals against inverse normal cumulative distribution values. epartures form normality being indicaed by deviantes of the plot from a straight line. Graphs > Quantile-comparision plot... birthwt$rstudent.linearmodel norm quantiles DEIO/CEAUL Valeska Andreozzi slide 62 23

24 Plot of residuals Independence: plot standardized residuals against the serial order in which the observations were taken. Again, random scatter of points indicates that the assumption is valid. Graphs > Scatterplot... low 0 1 rstudent.linearmodel rstudent.linearmodel obsnumber obsnumber Be carefull with the data set organization... DEIO/CEAUL Valeska Andreozzi slide 63 Plot of residuals The truth is: Data > Manage variables in active data set > Compute new variable birthwt$index <- with(birthwt, sample(1:189)) rstudent.linearmodel index DEIO/CEAUL Valeska Andreozzi slide 64 24

25 Plot of residuals Linearity: plot standardized residuals against individual explanatory variables. Linearity is indicated if all plots exhibit random scatter of equal width about zero. Non-linearity when residuals are plotted against explanatory variables in the model suggest that higher-order terms involving those variables should be added to the model. Systematic patterns exhibited when residuals are plotted against variables that are not included in the model suggest that those variables should be added to the model Graphs > Scatterplot... rstudent.linearmodel rstudent.linearmodel lwt age DEIO/CEAUL Valeska Andreozzi slide 65 Interaction When interaction is present, the association between the risk factor and the outcome variable differs, or depends in some way on the level of a covariate. That is, the covariate modifies the effect of the risk factor. Epidemiologists uses the term effect modifier to describe a variable that interacts with a risk factor. Interaction can be included in a regression model by adding the product term covariate times risk factor. DEIO/CEAUL Valeska Andreozzi slide 66 Interaction Interaction representation Without interaction With interaction group A group B group A group B y (outcome) y (outcome) x (risk factor) x (risk factor) DEIO/CEAUL Valeska Andreozzi slide 67 25

26 in R The cystfibr data frame (package: ISwR) contains lung function data for cystic fibrosis patients (7-23 years old). Call: lm(formula = pemaxlog ~ bmp * sex, data = cystfibr) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-08 *** bmp sex[t.fem] bmp:sex[t.fem] Signif. codes: 0 *** ** 0.01 * Residual standard error: on 21 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 21 DF, p-value: DEIO/CEAUL Valeska Andreozzi slide 68 26

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Regression and correlation

Regression and correlation 6 Regression and correlation The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals.

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria بسم الرحمن الرحيم Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria Correlation Finding the relationship between

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11)

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

The General Linear Model. April 22, 2008

The General Linear Model. April 22, 2008 The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

F-tests and Nested Models

F-tests and Nested Models F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)

More information

The General Linear Model. November 20, 2007

The General Linear Model. November 20, 2007 The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information