Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Size: px
Start display at page:

Download "Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model"

Transcription

1 1 Linear Regression

2 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor variable: simple linear regression We will then consider the case of the model with more than one predictor variable: multiple linear regression

3 3 Linear Regression Models Recall a regression model is defined by 1 a random response variable 2 a list of predictor variables 3 a regression equation 4 a distribution for the value of the random response variable

4 4 Simple Linear Regression The regression equation for simple linear regression is: EY i = µ i = α + β x i Note that the link function g is the identity function for linear regression. The assumption here is that the relationship between x and EY i is a straight line The slope of the line is β

5 5 Example: Full Blood Count A clinical full blood count takes a standard volume of blood and measures: # of blood cells (platelets, white, red) haemoglobin concentration Empirically, there is a log-linear relationship between the number of red cells and haemoglobin concentration

6 6 log(haemoglobin) on log(rbc) Example

7 7 Interpretation of α To interpret α put x i = 0 into the regression equation: then EY i = α + β x i EY i = α α is the average value of the response variable amongst study subjects for which the predictor variable is zero.

8 8 Interpretation of β To interpret β put x = z and x = z + 1 for study subjects i and i into the regression equation to obtain: EY i = α + β z (1) EY i = α + β (z + 1) (2) then take equation (1) from equation (2) EY i EY i = β (3)

9 9 Interpretation: Blood Count Example EY i = α + β x i E log(hgb) = log(#rbc) HGB = haemoglobin concentration #RBC = red cell count For the blood example when log(rbc) = 0 the average log(hgb) = 1.50 Why is this interpretation silly? If log(#rbc) increases by 1; log(hgb) increases on average by 0.74

10 10 Linear Regression Response For linear regression the response distribution is assumed to be normal (sometimes called Gaussian). Y i N(µ i, σ 2 ) equivalently Y i N(α + β x i, σ 2 )

11 Linear Regression Errors The quantity ϵ i = Y i µ i = Y i (α + β x i ) is the error corresponding to study subject i The distributional assumption of linear regression is equivalent to the assumption that the errors are normally distributed, with mean zero: ϵ i N(0, σ 2 ) The error variance σ 2 is the same for each study sample 11 σ 2 can be estimated from the data using maximum likelihood

12 12 Linear Regression Error Assumption We can put the regression equation and distribution assumption into a single statement: Y i = α + β x i + ϵ i, ϵ i N(0, σ 2 )

13 13 Linear Regression Residuals We define residual for study subject i by: r i = Y i ( ˆα + ˆβ x i ) Recall that the error for study subject i is defined by ϵ i = Y i (α + β x i ) Note that residuals and errors are not the same. Errors are unknown because we don t know α and β Residuals can be computed from the data Residuals can be thought of as estimates of errors

14 14 Linear Regression Residuals

15 Properties of Linear Regression Residuals Although residuals and errors are not the same, residuals have similar properties to errors: 1 The mean (and sum) of the residuals for a study sample is equal to zero 2 Residuals are normally distributed 3 The variance of the residuals should not depend on the value of the predictors The first property holds regardless of the validity of the modelling assumptions 15 The second and third properties only holds if the model assumptions are valid. Specifically only if 1 The relationship between x and Y is linear 2 The errors are normally distributed 3 The variance of the errors is constant

16 16 Checking Modelling Assumptions Before we rely on an inference made from a linear regression model, we should always verify that the modelling assumptions hold Specifically we should check 1 EY is a linear function of x 2 The properties of the residuals are consistent with the assumption about the distribution of Y

17 17 Check Linearity Suppose the R variable y is a vector containing data from a response variable Y and the R variable x is a vector containing data from a predictor variable x. We can generate a plot of y against x with the command > plot(x,y) We will deal with non-linearity in a subsequent lecture

18 18 Fitting a Linear Model in R We can fit a linear regression in R using the lm function. > fit.obj = lm(y~x) Fits the regression equation EY = α + β x The result of the model fit is stored in the R object fit.obj

19 19 Extracting the Residuals The residuals can be extracted from the linear regression object using fit.obj$residuals For example to draw a histogram of the residuals you can type: > hist(fit.obj$residuals) Alternatively you can do the fitting and plotting in one statement, without storing a model object: > hist(lm(y~x)$residuals)

20 20 By examining a histogram of the residuals we can check the normality assumption holds Histogram of the Residuals

21 21 QQ-plot of the Residuals A Q-Q plot (short for quantile-quantile plot) is a graphical method for comparing two probability distributions We can use it to compare the observed distribution of the residuals with the distribution of a N(0, 1) random variable If the residuals are normally distributed the plot should follow an approximately straight line

22 22 1 Draw n lines on a normal density to divide into n + 1 regions of equal probability 2 Plot the x-axis values of the dotted lines against the ordered residuals QQ-plot of the Residuals Suppose we have n data points, so n residuals

23 QQ-plot of the Residuals 23 R commands to generate this plot in the next session

24 24 Plot the Residuals vs. the Predictor Plot the residuals against the predictor variable to verify that the distribution of the residuals is independent of x > plot(x, lm(y~x)$residuals)

25 25 Maximum Likelihood Estimation For linear regression there are formulae for the maximum likelihood estimates of the regression coefficients: n i ˆβ = (x i x)(y i Ȳ) n i (x i x) 2 ˆα = ȳ ˆβ x However we do not need to worry about these too much as R will do the calculations for us

26 26 Viewing Model Fit Information in R The simplest way to view model fit information in R is to type the name of a fitted model object and hit return: > fit.obj = lm(y~x) > fit.obj Call: lm(formula = y ~ x) Coefficients: (Intercept) x This prints the MLEs of the coefficients

27 27 Printing Confidence Intervals Using R 95% confidence limits can be computed with the confint function > confint(fit.obj) 2.5 % 97.5 % (Intercept) x A different confidence level can be specified if desired, e.g. 99%: > confint(fit.obj, level=0.99) 0.5 % 99.5 % (Intercept) x

28 28 The display command > display(fit.obj) lm(formula = y ~ x) coef.est coef.se (Intercept) x n = 500, k = 2 residual sd = 0.51, R-Squared = 0.94 Prints: the MLE, the standard errors of the coefficient MLEs, the standard deviations of the regression coefficients, the residual standard deviation and R 2

29 29 Standard Errors of the MLEs Back to the idea of imaginary repeated experiments Suppose, in an imaginary world we: 1 repeat our experiment very many times 2 generate a new dataset on each occasion 3 estimate a new MLE ˆβ using each dataset The MLE is a random variable under this replication process The standard error of ˆβ denoted SE( ˆβ) is defined as the standard deviation of the MLE.

30 30 Proportion of Variance Explained R 2 is the proportion of the variance in the response which is explained by the predictor. R 2 is a number between 0 and 1 R 2 is a measure of the correlation between x and y. When R 2 = 1, x is perfectly correlated with y and the residuals are all equal to 0 When R 2 = 0, x contains no information about y.

31 31 Residual Standard Deviation The residual standard deviation is what it says on the tin: sd(ˆϵ) = 1 n n (ϵ i ϵ) 2 i

32 32 The R summary Command > summary(fit.obj) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** x <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 498 degrees of freedom Multiple R-squared: ,^^IAdjusted R-squared: F-statistic: 7462 on 1 and 498 DF, p-value: < 2.2e-16

33 33 p-values The p-value in the Pr(> t ) column of the summary command is a measure of the weight of evidence against the null hypothesis that the regression coefficient in that row is equal to zero. The null hypothesis is so called because it refers to the assumed position that there is no association between the predictor and the response. Usually the evidence must be strong before a null hypothesis is rejected A p-value is a number between 0 and 1. The smaller the number the greater the evidence against the null hypothesis. Typically a p-value at least as small as 0.05 is required to reject a null hypothesis.

34 34 Interpretation of p-values The interpretation of p-values, is based on the idea of imaginary repeated experiments. Suppose, in an imaginary world we: 1 repeat our experiment very many times 2 generate a new dataset on each occasion 3 calculate a new p-value level using each dataset then assuming the null hypothesis is true α 100% of the calculated p-values should be less than α Small p-values are rare when the null hypothesis is true

35 35 Computing Confidence Intervals Manually Although R provides the confint function, confidence intervals can also be computed manually from standard errors Not all statistical software provides functions to compute confidence intervals so this is a useful skill Standard errors are listed in the second column of the summary output. (They are also printed by the display command) Manual calculation of confidence intervals is based the assumption that the MLE of the regression coefficient follows a normal distribution.

36 36 Computing Confidence Intervals Manually We can compute a 95% confidence interval for a regression coefficient using a normal approximation: ˆβ 1.96 SE( ˆβ) < β < ˆβ SE( ˆβ)

37 37 Multiple Linear Regression Multiple linear regression is very similar to simple linear regression More than one predictor is now allowed on the right handside of the equation EY i = µ i = α + β 1 x i1 + β 1 x i1 +...β 1 x ip The assumptions about the distribution of Y i (normal, homogeneous variance) are the same as those for simple linear regression.

38 38 Fitting a Multiple Linear Regression A multiple linear regression can be fitted with the lm command. > fit.obj=lm(y~x1+x2) Information can be extracted from the model object using the functions already seen: confint, display and summary.

39 39 When to Use Multiple Linear Regression Multiple linear regression is useful when more than one predictor is thought to associate with the response simultaneously By fitting both predictors in the same model we can get more precise estimates of the regression coefficients

40 40 Fitting a Multiple Linear Regression > summary(fit.obj) Call: lm(formula = y ~ x1 + x2) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x e-07 *** x * --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 47 degrees of freedom Multiple R-squared: ,^^IAdjusted R-squared: F-statistic: on 2 and 47 DF, p-value: 6.864e-07

41 41 Multiple Linear Regression: Interpretation of α To interpret α put x ij = 0 into the regression equation for each predictor: then EY i = µ i = α + β 1 x i1 + β 1 x i1 +...β 1 x ip EY i = α α is the average value of the response variable amongst study subjects for which every predictor variable is zero.

42 42 Interpretation of β j To interpret β j, the regression coefficient for the jth predictor variable, put x = z for study subjects i and i and x = z + 1 into the regression equation to obtain: EY i = α + β 1 x i1 +...β j z +...β p x ip (4) EY i = α + β 1 x i β j (z + 1) +...β p x i p (5) then take equation (1) from equation (2) EY i EY i = β β is the difference in the average value of the response variable between groups of study subjects for which the predictor variable differs by one unit.

43 43 Multiple Linear Regression with Interactions Multiple linear regression assumes that the effect of a unit change in a predictor on the mean of the response is independent of the values of the other predictors. e.g. increasing the predictor value x ij by one unit increases EY i by the amount β j, whatever the values of the other predictor variables. Interaction models allow us to relax this assumption

44 44 Interactions in Linear Regression An interaction model is one where the interpretation of the effect of one predictor depends on the value of another and vice versa. The simplest interaction models includes a predictor variable formed by multiplying two ordinary predictors: EY i = α + β i1 x i1 + β 2 x i2 + β 3 x i1 x i2 Interaction term

45 45 Interaction Between 2 Variables Consider a linear model where the main predictors of Y (blood pressure in mmhg) are age (in years) and weight in (kg) EY = α + β 1 weight + β 2 age + β 3 (weight age)

46 46 Interpreting β 1 and β 2 EY i = α + β 1 weight + β 2 age + β 3 (weight age) We would like to know how to interpret β 1 if the interaction term was not there. Since in that case would just have an ordinary multivariate linear model. This happens when the age of a study subject is equal to 0, then weight: EY = α + β 1 weight + β β 3 (weight 0) = α + β 1 weight

47 47 Interpreting β 1 and β 2 Amongst study subjects aged 0 years EY = α + β 1 weight We know how to interpret β 1 in this case as it s a simple linear model. β 1 is the difference in the expected BP between individuals whose weight differs by 1kg and are aged 0 years This interpretation is factual correct, but practically not very useful The data aren t likely to contain many 0 year olds. Is the model valid in this range?

48 48 Interpreting β 1 and β 2 EY = α + β 1 weight + β 2 age + β 3 (weight age) To interpret β 2 we need to get rid of the interaction term without getting rid of the β 2 age term. Same argument as before but now set weight=0: EY = α + β β 2 age + β 3 (0 age) = β 0 + β 2 age β 2 is the difference in the expected BP between individuals whose age differs by 1 year and who weigh 0kg.

49 49 Interpreting β 3 EY = α + β 1 weight + β 2 age + β 3 (weight age) To interpret β 3 rewrite the regression equation: EY = α + [β 1 + β 3 age]weight + β 2 age This looks like a multivariate regression model with weight and age as predictors where: β 1 + β 3 age is the regression coefficient for weight β2 is the regression coefficient for age β 3 is the difference between the regression coefficients for weight for study subjects whose age differs by 1 year

50 50 Interpreting β 3 EY = α + β 1 weight + β 2 age + β 3 (weight age) We could just as well have rewritten the equation this way: EY = α + β 1 weight + [β 2 + β 3 weight]age β 3 is the difference between the regression coefficients for age for study subjects whose weight differs by 1kg So we have two ways of thinking about β 3 : 1 either as modification of the effect of weight by age 2 or the modification of the effect of age by weight.

51 51 Fitting an Interaction Term in R A multiple linear regression can be fitted with the lm command. > summary(lm(y~x1*x2)) Call: lm(formula = y ~ x1 * x2) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x x x1:x ** --- Signif. codes: 0 *** ** 0.01 *

52 52 Red Blood Cell Count in UK Biobank The distribution of the number of red blood cells in a unit of blood in middle aged males and females in the UK.

53 53 Categorical Predictors in Regression So far we have implicitly assumed that the predictor variable x is numerical and that the data contain a range of values for x The previous example shows how we might wish to use a categorical variables such as sex as a predictor in a regression model How do we put a categorical variable into a regression equation? EY i = µ i = α + β "Female" does not make sense - only numbers can be put into equations

54 54 Dummy Variables The solution to this problem is to use dummy variables A dummy variable is a 0/1 variable which acts as proxy for the value of a categorical variable For example if x is a categorical variable with possible categories "Male"/"Female" we can substitute a dummy variable δ with: δ i = 0 δ i = 1 if and only if x i ="Male" if and only if x i ="Female" The regression equation is now: EY i = µ i = α + β δ i

55 55 Dummy Variables The regression equation: EY i = µ i = α + β δ i δ i = 0 δ i = 1 if and only if x i ="Male" if and only if x i ="Female" The mean value of Y i in females is α + β The mean value of Y i in males is α The interpretation of the regression coefficients depends on the coding chosen. We could have chosen to code Females as 0 and men as 1.

56 56 Dummy Variables When x has more than two possible categories we need more than one dummy variable to code the categories numerically. For example suppose x has possible categories "Male"/"Pre-Menopausal Female"/"Post-Menopausal Female". We need to choose a baseline category, which corresponds to all the dummy variables being equal to zero. This choice changes the interpretation of the coefficients but has no effect on statistical inferences

57 Dummy Variables δ i1 = 0 δ i1 = 1 δ i2 = 0 δ i2 = 1 if and only if x i ="Not Pre-menopausal Female" if and only if x i ="Pre-Menopausal Female" if and only if x i ="Not Pre-menopausal Male" if and only if x i ="Post-Menopausal Female" The regression equation becomes: EY i = µ i = α + β 1 δ i1 + β 2 δ i2 The mean value of Y i in pre-menopausal females is α + β 1 The mean value of Y i in post-menopausal females is α + β 2 57 The mean value of Y i in males is α

58 58 Fitting Categorical Predictors in R R calls categorical variables "factor" variables R automatically converts factor variables into dummy variables when they are put into a regression equation: > lm(rbc~sex) Call: lm(formula = rbc ~ sex) Coefficients: (Intercept) sexmale

59 59 Summary This lecture has been about linear regression Linear regression is used to model the association between the mean of a random variable and one or more predictors We ve covered univariate and multiple regression Interpretation of regression coefficients Interaction terms Dummy variables

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Correlated Data: Linear Mixed Models with Random Intercepts

Correlated Data: Linear Mixed Models with Random Intercepts 1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x Question 1 Suppose that we have data Let x 1 n x i px 1, y 1 q,..., px n, y n q. ȳ 1 n y i s 2 X 1 n px i xq 2 Throughout this question, we assume that the simple linear model is correct. We also assume

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

ACOVA and Interactions

ACOVA and Interactions Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Topics on Statistics 2

Topics on Statistics 2 Topics on Statistics 2 Pejman Mahboubi March 7, 2018 1 Regression vs Anova In Anova groups are the predictors. When plotting, we can put the groups on the x axis in any order we wish, say in increasing

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

De-mystifying random effects models

De-mystifying random effects models De-mystifying random effects models Peter J Diggle Lecture 4, Leahurst, October 2012 Linear regression input variable x factor, covariate, explanatory variable,... output variable y response, end-point,

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Review of the General Linear Model

Review of the General Linear Model Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

How to mathematically model a linear relationship and make predictions.

How to mathematically model a linear relationship and make predictions. Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College (Compile date: Mon Apr 28 20:50:28

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

How to mathematically model a linear relationship and make predictions.

How to mathematically model a linear relationship and make predictions. Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College Redistribution of this material

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Nonstationary time series models

Nonstationary time series models 13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information