Handout 4: Simple Linear Regression

Size: px
Start display at page:

Download "Handout 4: Simple Linear Regression"

Transcription

1 Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code: msm = read.csv(" 1 Background and Maximum Likelihood Estimates The European Food Safety Authority recently issued a scientific opinion on the public health risks related to mechanically separated meat (MSM). The analysis suggested that calcium could be used to distinguish between MSM and non-msm products. A random sample of MSM poultry was obtained and the deboner head pressure (in psi) and the amount of calcium (in ppm) was measured for each. The data are given in the following table. Based on this data set we want to build a model that can predict how much calcium (Y i ) a sample will contain depending on the pressure (x i ) the deboner used on the poultry. Our model will be: Y i = β 0 + β 1 x i + ɛ i where ɛ i iid N(0, σ 2 ), i = 1, 2,..., 18 A consequence of the definition above is that Y i indep. N (β 0 + β 1 x i, σ 2 ) for all i = 1, 2,..., n. In order to fit the model we first must find the maximum likelihood estimates for β 0, β 1, and σ 2. (Note that n = 18, but we re going to pretend for the time being we don t know that fact). The likelihood and log-likelihood equations are: L(β 0, β 1, σ 2 ) = n [ ( 1 exp (y )] i β 0 β 1 x i ) 2 2πσ 2 2σ 2 l(β 0, β 1, σ 2 ) = n 2 log(2πσ2 ) n (y i β 0 β 1 x i ) 2 2σ 2 1

2 Pressure (in psi) Calcium (in ppm) Now we must take the derivative of the likelihood equation with respect to β 0, β 1 and σ 2. l β 0 = l β 1 = n n (y i β 0 β 1 x i ) σ 2 l σ = n n 2 2σ + 2 x i (y i β 0 β 1 x i ) σ 2 (y i β 0 β 1 x i ) 2 2(σ 2 ) 2 Set the three equations above equal to zero and simultaneously solve for β 0, β 1 and σ 2. The results will be the maximum likelihood estimates. Prove to yourself the following are the maximum 2

3 likelihood estimates: ˆβ 0 = ȳ ˆβ 1 x n ˆβ 1 = (x i x)(y i ȳ) n (x i x) 2 ˆσ 2 = n (y i ˆβ 0 ˆβ 1 x i ) 2 To find the maximum likelihood estimates for β 0 and β 1 in R, we can use the following code: > n = dim(msm)[1] > x = msm$pressure > y = msm$calcium > beta1 = sum( ( x - mean(x) ) * ( y - mean(y) ) )/sum( ( x - mean(x) )^2 ) > beta0 = mean(y) - beta1 * mean(x) > beta0 [1] > beta1 [1] To interpret ˆβ 0 s value of , we would say that the expected calcium, given the machine is set to a pressure of 0 psi, is 505 ppm. Note that often times, interpretations of ˆβ 0 might be non-sensical such as in this case; in this example when the machine is set to 0 psi it can t separate the meat. Often what is of scientific interest is the interpretation of ˆβ 1. In this example, one way to interpret ˆβ 1 is to say that for a 1 psi increase in pressure the calcium concentration is expected to increase by 1.01 ppm. Typically, we don t use the maximum likelihood estimator for σ 2 because it is a biased estimator (prove this fact to yourself). Instead, we use the unbiased estimate we sometimes refer to as MSE, n (y i ŷ i ) 2 n (y i MSE = = ˆβ 0 ˆβ 1 x i ) 2 n 2 n 2 To find the value for MSE using R, we can use the following code: > yhat = beta0 + beta1*x > MSE = sum( (y - yhat)^2 )/(n-2) > MSE [1] n

4 2 Hypothesis Testing H 0 : β 1 = 0 vs. H a : β 1 0 According to the assumptions we made, ˆβ 1 N ( β 1, ) σ 2 n (x i x) 2 If we wanted to test the null hypothesis of H 0 : β 1 = 0 vs. H a : β 1 0 then our test statistic might be: ˆβ 1 0 test statistic = σ 2 n (x i x) 2 However, there is a problem with the test statistic above, we don t know the value of σ 2, so we have to substitute in for σ 2 the unbiased estimate we previously found. test statistic = ˆβ 1 0 MSE n (x i x) 2 Then when the null hypothesis is true, test statistic = ˆβ 1 0 MSE n (x i x) 2 t (n 2) To carry out the equivalent test in R, we could use the following code: > test.stat = beta1/sqrt( MSE / sum( ( x - mean(x) )^2 ) ) > test.stat [1] > alpha = 0.05 > # Rejection region approach > cutoff = qt( c(alpha/2, 1-alpha/2), df = n - 2 ) > cutoff [1] > (test.stat <= cutoff[1]) (test.stat >= cutoff[2]) [1] TRUE > > # p-value approach > p.value = 2*pt( test.stat, df = n - 2, lower.tail = F) 4

5 > p.value [1] > p.value <= alpha [1] TRUE From our hypothesis tests above we can now make a conclusion. If we choose to use the rejection region approach then we reject H 0 if the test statistic falls in to either the (, 2.12] or the [2.12, ) interval. Since our test statistic is 3.57 then our conclusion becomes we reject H 0 and conclude significance, of course, no conclusion is complete without referencing the context of the problem, so here we would conclude that calcium concentration in MSM poultry is linearly associated with pressure of the separation machine. If we choose to use the p-value approach to hypothesis testing then we compare our p-value against the pre-selected significance level of α = Since the p-value is which is less than 0.05 then we reject the null hypothesis and conclude their is a significant relationship. Like before, to complete the conclusion we need to explain which relationship is significant so we need to say that the linear relationship between calcium concentration in MSM poultry and the pressure of the separation machine is significant. 3 Confidence Interval for Estimated Mean and a New Observation at a given point There are a few other things that we might be interested in examining with our model. Suppose we wanted to create a confidence interval for the estimated mean response at a given point of x = x h. From class we know that such a confidence interval has the following formula: Ŷ ± t (n 2);1 α/2 MSE ( 1 n + (x ) h x) 2 n (x i x) 2 Suppose we are interested in generating a 95% confidence interval for the mean calcium at 100 psi. To do this in R we could use the following code: > y_100 = beta0 + beta1*100 > y_100 [1] > y_100 + c(-1,1)*qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n + + (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) [1]

6 Some students find the concept of using vectors in R challenging, so an alternative way is to produce the lower and upper endpoints of the interval separately, like so: > lower = y_100 - qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n + + (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) > upper = y_100 + qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n + + (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) > lower [1] > upper [1] Notice that both ways produce equivalent results. The 95% confidence interval for the mean calcium when pressure is 100 psi is (589.1, 624.1). We interpret this confidence interval by saying We are 95% confident that the mean amount of calcium at 100 psi is between and In R, there is a builtin function that will achieve the same results: > predict(mod, newdata = list(pressure = 100), level = 0.95, interval = "confidence") fit lwr upr Now suppose there was a new observation at 100 psi, to calculate a confidence interval for that new observation we use the formula: Ŷ new ± t (n 2);1 α/2 MSE ( 1 n (x ) h x) 2 n (x i x) 2 To calculate a 95% confidence interval for the calcium of a new observation at 100 psi we could use the following R code: > y_100 + c(-1,1)*qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n (100-mean(x))^2/sum( (x-mean(x))^2 ) [1] or, > lower = y_100 - qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) > upper = y_100 + qt(1-0.05/2, df = n-2)*sqrt(mse*(1/n (100-mean(x))^2/sum( (x-mean(x))^2 ) ) ) 6

7 > lower [1] > upper [1] So the 95% confidence interval we just solved for would have the following interpretation, We are 95% confident that a new observation with a pressure of 100 psi will be between and ppm. In R the same builtin function can be used to solve for the prediction interval: > predict(mod, newdata = list(pressure = 100), level = 0.95, interval = "prediction") fit lwr upr S ums of Squares Finally, we can find the Sum of Squares due to Regression, the Sum of Squares due to Error, and the Sum of Squares of Total. Recall the formulas: SSE (sometimes called RSS) = SSR = SST O = n (y i ŷ i ) 2 n (ŷ i ȳ) 2 n (y i ȳ) 2 In R, we can find these values easily using the following code: > yhat = beta0 + beta1*x > SSE = sum( (y - yhat)^2 ) # called RSS, residual sum of squares > SSE [1] > SSReg = sum( (yhat - mean(y))^2 ) # SS Regression > SSReg [1] > SSTO = sum( (y - mean(y))^2 ) > SSTO [1]

8 > Rsquared = SSReg/SSTO > Rsquared [1] Of course there is an easy way to do all of these tasks in R without having to calculate all this: > mod = lm(calcium ~ Pressure, data = msm) > summary(mod) Call: lm(formula = Calcium ~ Pressure, data = msm) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-12 *** Pressure ** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 16 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 16 DF, p-value: Checking Model Assumptions Regardless, we need to check the assumptions made for linear regression are correct. One of the assumptions we made was the variance is constant for all observations. To check the assumption we can plot the residuals vs. fitted values. > plot(x = yhat, y = stdresid, xlab = "Predicted Calcium (in ppm)", ylab = + "Standardized Residuals", main = "Standardized Residuals\nvs. Fitted") > abline(h = 0, lwd = 1, lty = 2, col = "grey") The plot that results from the code above is figure 1. 8

9 Standardized Residuals Standardized Residuals vs. Fitted Predicted Calcium (in ppm) Figure 1: Residuals vs. Fitted Values. The cloud of points should be centered around zero and remain fairly constant. The next assumption we can check the assumption that the data are normally distributed. To check this assumption we can create a QQ plot of the residuals. To check this assumption in R we can use the following code: > qqnorm( scale(y-yhat) ) > qqline( scale(y-yhat), lty = 2, lwd = 1, col = "grey" ) The plot the code generates is in Figure 2. Finally, one of the plots often included in is a scatter plot with the regression line added (see Figure 3). The following code generates that: > plot(x = x, y = y, xlab = "Pressure (in psi)", ylab = "Calcium (in ppm)", + main = "Scatterplot of data with\nregression line added") > curve(beta0 + beta1*x, from = min(x), to = max(x), add = TRUE, lwd = 2, lty = 1) 9

10 Sample Quantiles Normal Q Q Plot Theoretical Quantiles Figure 2: QQ plot. The points should follow the line y = x if the data is normally distributed. This looks pretty close. 10

11 Calcium (in ppm) Scatterplot of data with regression line added Pressure (in psi) Figure 3: Scatter plot with regression line. 11

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Chapter 8: Simple Linear Regression

Chapter 8: Simple Linear Regression Chapter 8: Simple Linear Regression Shiwen Shen University of South Carolina 2017 Summer 1 / 70 Introduction A problem that arises in engineering, economics, medicine, and other areas is that of investigating

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Estimated Simple Regression Equation

Estimated Simple Regression Equation Simple Linear Regression A simple linear regression model that describes the relationship between two variables x and y can be expressed by the following equation. The numbers α and β are called parameters,

More information

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix Regression Analysis lab 3 1 Multiple linear regression 1.1 Import data delivery

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Solution to Series 3

Solution to Series 3 Prof. Nicolai Meinshausen Regression FS 2016 Solution to Series 3 1. a) The general least-squares regression estimator is given as Using the model equation, we get in this case ( ) X T x X (1)T x (1) x

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

How to mathematically model a linear relationship and make predictions.

How to mathematically model a linear relationship and make predictions. Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College (Compile date: Mon Apr 28 20:50:28

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

How to mathematically model a linear relationship and make predictions.

How to mathematically model a linear relationship and make predictions. Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College Redistribution of this material

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Chapter 11: Linear Regression and Correla4on. Correla4on

Chapter 11: Linear Regression and Correla4on. Correla4on Chapter 11: Linear Regression and Correla4on Regression analysis is a sta3s3cal tool that u3lizes the rela3on between two or more quan3ta3ve variables so that one variable can be predicted from the other,

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Generating OLS Results Manually via R

Generating OLS Results Manually via R Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Nonstationary time series models

Nonstationary time series models 13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

STAT2012 Statistical Tests 23 Regression analysis: method of least squares 23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)

More information

Replication of Examples in Chapter 6

Replication of Examples in Chapter 6 Replication of Examples in Chapter 6 Zheng Tian 1 Introduction This document is to show how to perform hypothesis testing for a single coefficient in a simple linear regression model. I replicate examples

More information