22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

Size: px
Start display at page:

Download "22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation"

Transcription

1 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative variables Examples: predict salary from years of experience find effect of lead exposure on school performance predict force at which a metal alloy rod bends based on iron content 1

2 Simple Linear Regression Linear regression model The basic model Y i = β 0 + β 1 x i + ɛ i Y i is the response of dependent variable x i is the observed predictor, explanatory variable, independent variable, covariate x i is treated as a fixed quantity (or if random it is conditioned upon) ɛ i is the error term ɛ i are iid N(0, σ 2 ) So, E[Y i ] = β 0 + β 1 x i + 0 = β 0 + β 1 x i 2

3 Simple Linear Regression Linear regression model Key assumptions (will check these later) linear relationship (between Y and x) *we say the relationship between Y and x is linear if the means of the conditional distributions of Y x lie on a straight line independent errors (independent observations in SLR) constant variance of errors normally distributed errors 3

4 Simple Linear Regression Interpreting the model Model can also be written as: Y i X i = x i N(β 0 + β 1 x i, σ 2 ) mean of Y given X = x is β 0 + β 1 x (known as conditional mean) β 0 + β 1 x is the mean value of all the Y s for the given value of x β 0 is conditional mean when x=0 β 1 is slope, change in mean of Y per 1 unit change in x σ 2 is the variation of responses at x (i.e. dispersion around conditional mean) 4

5 Simple Linear Regression Estimation of β 0 andβ 1 We wish to use the sample data to estimate the population parameters: the slope β 1 and the intercept β 0 Least squares estimation choose ˆβ 0 = b 0 and ˆβ 1 = b 1 such that we minimize the sum of the squared residuals, i.e. minimize n i=1 (Y i Ŷi) 2 minimize g(b 0, b 1 ) = n i=1 (Y i (b 0 + b 1 x i )) 2 Take derivative of g(b 0, b 1 ) with respect to b 0 and b 1, set equal to zero, and solve 5

6 Results: b 0 = Ȳ b 1 x b 1 = ni=1 (x i x)(y i Ȳ ) ni=1 (x i x) 2 the point ( x, Ȳ ) will always be on the least squares line b 0 and b 1 are best linear unbiased estimators (best meaning smallest variance estimator) Notation for fitted line: or Ŷ i = ˆβ 0 + ˆβ 1 x i Ŷ i = b 0 + b 1 x i or in the text Ŷ i = A + Bx i 6

7 predicted (fitted) value: Ŷ i = b 0 + b 1 x i residual: e i = Y i Ŷi Y X The least squares regression line minimizes the residual sums of squares (RSS)= n i=1 (Y i Ŷi) 2 7

8 Example: Cigarette data Measurements of weight and tar, nicotine, and carbon monoxide content are given for 25 brands of domestic cigarettes. VARIABLE DESCRIPTIONS: Brand name Tar content (mg) Nicotine content (mg) Weight (g) Carbon monoxide content (mg) Mendenhall, William, and Sincich, Terry (1992), Statistics for Engineering and the Sciences (3rd ed.), New York: Dellen Publishing 8

9 Do a scatterplot, fit the best fitting line according to least squares estimation. > cig.data=as.data.frame(read.delim("cig.txt",sep=" ", header=false)) > dim(cig.data) [1] 25 5 ## This data set had no header, so I will assign ## the column names here: > dimnames(cig.data)[[2]]=c("brand","tar","nic", "Weight","CO") > head(cig.data) Brand Tar Nic Weight CO 1 Alpine Benson-Hedges BullDurham CamelLights Carlton Chesterfield

10 > plot(cig.data$tar,cig.data$nic) cig.data$nic cig.data$tar ## Fit a simple linear regression of Nicotine on Tar. > lm.out=lm(nic~tar,data=cig.data) ## Get the estimated slope and intercept: > lm.out$coefficients (Intercept) Tar You can do this manually too... 10

11 b 1 = ni=1 (x i x)(y i Ȳ ) ni=1 (x i x) 2 R easily works with vectors and matrices. > numerator=sum((cig.data$tar-mean(cig.data$tar))* (cig.data$nic-mean(cig.data$nic))) > denominator=sum((cig.data$tar-mean(cig.data$tar))^2) > b1=numerator/denominator > b1 [1] b 0 = Ȳ b 1 x > b0=mean(cig.data$nic)-mean(cig.data$tar)*b1 > b0 [1] The fitted line for this data: Ŷ i = x i 11

12 ## Add the fitted line to the original plot: > plot(cig.data$tar,cig.data$nic) > abline(lm.out) cig.data$nic cig.data$tar 12

13 13

14 Simple Linear Regression Estimating σ 2 One of the assumptions of linear regression is that the variance for each of the conditional distributions of Y x is the same at all x-values. Y X In this case, it makes sense to pool all the error information to come up with a common estimate for σ 2 14

15 Recall the model: Y i = β 0 + β 1 x i + ɛ i with ɛ i iid N(0, σ 2 ) We use the sum of the squares of the residuals to estimate σ 2 Acronyms: RSS Residual sum of squares SSE Sum of squared errors RSS SSE ˆσ 2 = RSS n 2 = ni=1 (Y i Ŷi) 2 n 2 RSS = n i=1 (Y i Ŷi) 2 E[ RSS n 2 ] = σ2 ˆσ = S E = SE 2 is called the standard error for the regression (a phrase used by this author) 15

16 2 is subtracted from n in the denominator because we ve used 2 degrees of freedom for estimating the slope and intercept (i.e. there were 2 parameters estimated in the mean structure). When we estimate σ 2 in a 1-sample population, we divide n i=1 (Y i Ŷi) 2 by (n 1) because we only estimate 1 parameter in the mean structure, namely µ. 16

17 Simple Linear Regression Total sums of squares (TSS) Total sums of squares (TSS) quantifies the overall squared distance of the Y -values from the overall mean of the responses Ȳ x y Y bar= TSS= n i=1 (Y i Ȳ )2 17

18 For regression, we can decompose this distance and write: Y i Ȳ = (Y i Ŷi) + (Ŷi Ȳ ) }{{}}{{} distance from observation to fitted line distance from fitted line to overall mean Which leads to the equation 1 : n (Y i Ȳ )2 = i=1 n (Y i Ŷi) 2 + i=1 n (Ŷi Ȳ )2 i=1 or T SS = RSS + RegSS where RegSS is the regression sum of squares 1 (a + b)2 a 2 + b 2. You must square both sides, then include the summation terms and then the cross terms will cancel out due to properties of the fitted line. 18

19 Total variability has been decomposed into explained and unexplained variability In general, when the proportion of total variability that is explained is high, we have a good fitting model The R 2 value (coefficient of determination): the proportion of variation in the response that is explained by the model R 2 = RegSS T SS R 2 = 1 RSS T SS also stated as r 2 in simple linear regression the square of the correlation coefficient r 0 R 2 1 R 2 near 1 suggests a good fit to the data if R 2 = 1, ALL points fall exactly on the line different disciplines have different views on what is a high R 2 = 1, in other words what is a good model 19

20 social scientists may get excited about an R 2 near 0.30 a researcher with a designed experiment may want to see an R 2 near

21 Simple Linear Regression Analysis of Variance (ANOVA) The decomposition of total variance into parts is part of ANOVA. As was stated before: T SS = RSS + RegSS Example: cigarette data cig.data$nic cig.data$tar 21

22 Look at the ANOVA table: You can get these sums of squares manually too... > sum((lm.out$fitted.values-mean(cig.data$nic))^2) [1] > sum(lm.out$residuals^2) [1] > sum((cig.data$nic-mean(cig.data$nic))^2) [1] Get the R2 value (2 ways shown): > summary(lm.out) look for... Multiple R-Squared: > summary(lm.out)$r.squared [1]

23 Example: Lifespan and Thorax of fruitflies LONGEVITY Lifespan, in days THORAX Length of thorax, in mm n= data$thorax data$longevity Sexual Activity and the Lifespan of Male Fruitflies by Linda Partridge and Marion Farquhar. Nature, 294, ,

24 The data and the variables: > ff.data=as.data.frame(read.delim("/fruitfly.txt", sep="\t",header=false)) > dimnames(ff.data)[[2]]=c("id","partners","type", "Longevity","Thorax","Sleep") > head(ff.data) ID Partners Type Longevity Thorax Sleep See how many different Partner values there are: > unique(ff.data$partners) [1]

25 Fit the simple linear regression model: > lm.fruitflies=lm(ff.data$longevity~ff.data$thorax) > summary(lm.fruitflies)... Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** ff.data$thorax e-15 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 13.6 on 123 degrees of freedom Multiple R-Squared: ,Adjusted R-squared: F-statistic: on 1 and 123 DF, p-value: 1.497e-15 Slope interpretation: For every 1 mm increase in the thorax length of a fruitfly, the average lifespan increases by days. 25

26 Intercept interpretation: When the thorax length is 0 mm, the average lifespan is days (*#??*). This doesn t make sense for this data set for 2 reasons... a fruitfly wouldn t have a 0 mm thorax, and x=0 is far outside of the range of observed x-values. > plot(ff.data$thorax,ff.data$longevity) > abline(lm.fruitflies,lwd=2) ff.data$thorax ff.data$longevity 26

27 > anova(lm.fruitflies) Analysis of Variance Table Response: ff.data$longevity Df Sum Sq Mean Sq F value Pr(>F) ff.data$thorax e-15 *** Residuals Signif. codes: 0 *** ** 0.01 * Regression sum of squares RegSS Residual sum of squares RSS Total sum of squares TSS R 2 = RegSS T SS = =

28 > summary(lm.fruitflies)$r.squared [1] R 2 interpretation: 40.5% of the total variability in lifespan for fruitflies is explained by the length of the thorax. 28

29 Simple Linear Regression Correlation coefficient The correlation coefficient r measures the strength of a linear relationship r = ni=1 (X i X)(Y i Ȳ ) ni=1 (X i X) 2 n i=1 (Y i Ȳ )2 ni=1 (X i X) 2 = n 1 ni=1 (Y i Ȳ )2 n 1 b 1 = S X SY b 1 it is the standardized slope, a unitless measure can be thought of as the value we would get for the slope if the standard deviations of X and Y were equal (similar spreads) would be the slope if X and Y had been standardized before fitting the regression 29

30 1 r 1 r near -1 or +1 shows a strong linear relationship a negative (positive) r is associated with an estimated negative (positive) slope the sample correlation coefficient r estimates the population correlation coefficient ρ r is NOT used to measure strength of a curved line 30

31 Common mistake People often think that as the estimated slope of the regression line, ˆβ 1, gets larger (steeper), so does r. But r really measures how close all the data points are to our estimated regression line. You could have a steep fitted line with a small r (noisy relationship), or a fairly flat fitted line with large r (less noisy relationship). This can be confusing because when the estimated slope is actually 0, then r is 0 no matter how close the points are to the regression line (see formulas for r on the previous pages). 31

32 Example: Cigarette data > cor(cig.data$tar,cig.data$nic) [1] cig.data$nic cig.data$tar Standardize the dependent and independent variables and fit the simple linear regression model to the standardized variables... 32

33 Standardizing the variables: > std.y=(cig.data$nic-mean(cig.data$nic))/sqrt(var(cig.data$nic)) > std.x=(cig.data$tar-mean(cig.data$tar))/sqrt(var(cig.data$tar)) Fitting the model to the standardized variables: > (lm(std.y~std.x))$coefficients (Intercept) std.x e e-01 The slope in the standardized regression is , which is the correlation between the original two variables (as we saw on the previous slide). Standardized Y vs. Standardize X std.y std.x 33

34 A very strong curved relationship can have an r value near 0. > plot(x,y) x y > abline(lm(y~x)) > cor(x,y) [1] The correlation coefficient measures the strength in a linear relationship. 34

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Statistical Techniques II EXST7015 Simple Linear Regression

Statistical Techniques II EXST7015 Simple Linear Regression Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

2 Regression Analysis

2 Regression Analysis FORK 1002 Preparatory Course in Statistics: 2 Regression Analysis Genaro Sucarrat (BI) http://www.sucarrat.net/ Contents: 1 Bivariate Correlation Analysis 2 Simple Regression 3 Estimation and Fit 4 T -Test:

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

The Classical Linear Regression Model

The Classical Linear Regression Model The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear:

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Christopher Ting Christopher Ting : christophert@smu.edu.sg : 688 0364 : LKCSB 5036 January 7, 017 Web Site: http://www.mysmu.edu/faculty/christophert/ Christopher Ting QF 30 Week

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Least-Squares Regression

Least-Squares Regression MATH 203 Least-Squares Regression Dr. Neal, Spring 2009 As well as finding the correlation of paired data {{ x 1, y 1 }, { x 2, y 2 },..., { x n, y n }}, we also can plot the data with a scatterplot and

More information

Section 4: Multiple Linear Regression

Section 4: Multiple Linear Regression Section 4: Multiple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 The Multiple Regression

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. Returning to a continuous response variable Y... 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable, Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.

More information

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ) 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

3. Diagnostics and Remedial Measures

3. Diagnostics and Remedial Measures 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares ST 430/514 Recall the linear regression equation E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + + β k x k We have estimated the parameters β 0, β 1, β 2,..., β k by minimizing the sum of squared

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression

22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression 22s:152 Applied Linear Regression Chapter 6: Statistical Inference for Regression Simple Linear Regression Assumptions for inference Key assumptions: linear relationship (between Y and x) *we say the relationship

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information