STAT 4385 Topic 03: Simple Linear Regression

Size: px
Start display at page:

Download "STAT 4385 Topic 03: Simple Linear Regression"

Transcription

1 STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Spring, 2017

2 Outline The Set-Up Exploratory Data Analysis (EDA) Scatterplot Coefficient of Correlation Model Specification Model Estimation LSE of Betas Estimation of Error Variance Statistical Inference

3 Set-Up The Set-Up Data consists of {(x i, y i ) : i = 1,..., n} that consists of n IID copies of (X, Y ), where both the response Y and the predictor X are continuous. Want to study the association/relationship between Y and X. Examples: Revenue vs. advertising expenditure; Population over years Daily rainfall vs. barometric pressure in a place College GPA vs. high school GPA Data Layout ID Y X 1 y 1 x 1 2 y 2 x n y n x n

4 Set-Up A Real Example A study is conducted to investigate the relationship between cigarette smoking during pregnancy and the weights of newborn infants. A sample of 15 woman smokers kept accurate records of the number of cigarettes smoked (X ) during their pregnancies, and weights of their children (Y ) were recorded at birth. The data are given in the table. Cigarettes Birth Weight ID Per Day (X ) Weight (Y )

5 Exploratory Data Analysis (EDA) Exploratory Data Analysis (EDA) Exploratory Data Analysis (EDA) summarizes and describes data and helps see what the data can tell us before and beyond the formal modeling or hypothesis testing task. Two Approaches: Graphical or Numerical For data in SLR, EDA is aimed to explore the bivariate association between X and Y : Graphical Displays: scatterplot Numerical Measures: correlation coefficient

6 Exploratory Data Analysis (EDA) Scatterplot Scatterplot A scatterplot (also called scatter chart, scattergram, scatter diagram) displays values for typically two variables for a set of data. Can be extended to 3D; using color-coded points allows for displaying another categorical variable. Inspect a scatterplot for patterns and outliers: Linear or nonlinear pattern? Positive or negative (monotonic) association between X and Y? Any potential outlier?

7 Exploratory Data Analysis (EDA) Scatterplot Scatterplot

8 Exploratory Data Analysis (EDA) Scatterplot Birthweight Example: Scatterplot birthweight number of daily cigas

9 Exploratory Data Analysis (EDA) Coefficient of Correlation Pearson Correlation Coefficient The Pearson product moment coefficient of correlation measures the direction and strength of the linear association between two variables. cov(x,y ) The population version: ρ(x, Y ) = var(x ) var(y ) Point Estimation with Data: the sample version r(x, Y ) = = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 n i=1 (y i ȳ) 2 n i=1 x iy i n xȳ { n i=1 x 2 i n x 2} { n i=1 y 2 i n ȳ 2}

10 Exploratory Data Analysis (EDA) Coefficient of Correlation Facts on ρ and r r is scaleless with 1 r 1. Direction of Linear Association: A positive r indicates a positive linear association (meaning Y increases as X increases) while r < 0 indicates a negative linear association. Their absolute values measure the strength of the linear association. The rule of thumb: When r = 1, perfect linear association. When r = 0.80, strong linear association When r = 0.50, moderate linear association When r = 0.20, weak linear association When r = 0, no linear association

11 Exploratory Data Analysis (EDA) Coefficient of Correlation Linear or Nonlinear Association Pearson correlation does not provide info on nonlinear association. Moreover, association does NOT imply causation.

12 Exploratory Data Analysis (EDA) Coefficient of Correlation Calculation of r Preliminary calculation of six quantities: { n, i x i, i y i, i x 2 i, i y 2 i, i x i y i }. Obtain x = i x i/n and ȳ = i y i/n; Next compute SS xx = i SS yy = i SS xy = i x 2 i n x 2 y 2 i n ȳ 2 x i y i n x ȳ; Finally compute r = SS xy SSxx SS yy.

13 Exploratory Data Analysis (EDA) Coefficient of Correlation Worksheet: Computing r Cigarettes Birth Weight ID Per Day (x i ) Weight (y i ) xi 2 yi 2 x i y i sum ,

14 Exploratory Data Analysis (EDA) Coefficient of Correlation Example: Computing r We have found that n = 15, x i = 380, y i = 115.5, x 2 i = 10842, y 2 i = , and x i y i = Hence x = 380/15 = and ȳ = 115.5/15 = 7.7 SS xx = x 2 i n x 2 = = SS yy = y 2 i nȳ 2 = = SS xy = x i y i n xȳ = = 50.7 Thus r = SS xy SSxx SS yy = = , which shows a somewhat moderate negative linear association.

15 Exploratory Data Analysis (EDA) Coefficient of Correlation Inference on ρ Case I Test for zero correlation, i.e., H 0 : ρ = 0 vs. H a : ρ 0 Preferable to use the equivalent test of zero slope in a simple linear regression model. Assuming (X, Y ) follow a bivariate normal distribution with ρ = 0, the fact r n 2/ 1 r 2 t (n 2) leads to a t test.

16 Exploratory Data Analysis (EDA) Coefficient of Correlation Inference on ρ Case II Test on a non-zero correlation H 0 : ρ = ρ 0 Assuming (X, Y ) follow a bivariate normal distribution, Fisher s (monotonic) z-transformation converts r into an almost normally distributed variable with constant variance 1/(n 3): r = arctanh(r) = 1 2 ln 1 + r 1 r N ( 1 2 ln 1 + ρ ) 1 ρ, 1, n 3 where SE(r ) = 1/ n 3. The arctanh is the inverse hyperbolic tangent function. Implemented by R function cor.test(). The above result can also be used to compare two correlations H 0 : ρ 1 = ρ 2 based on two independent data sets.

17 Exploratory Data Analysis (EDA) Coefficient of Correlation Fisher s z Transform: Simulation Study Fisher s z transform helps symmetrize the distribution of r. Each data set of size n = 30 was generated from bivariate normal with true ρ = 0.7 and 100,000 simulation runs. (a) Histogram of r (b) Histgram of Transformed r Density Density r arctanh(r)

18 Exploratory Data Analysis (EDA) Coefficient of Correlation Example: Hypothesis Testing on ρ Consider the smoking vs. infant birth weight example. Want to test H 0 : ρ = 0.5 vs. H a : ρ 0.5. This is equivalent to test H 0 : 1 2 ln 1 + ρ 1 ρ = ( 0.5) ln 2 1 ( 0.5) = The test statistic 1 2 ln 1 + r 1 r 1 2 ln 1 + ρ 0 1 ρ z obs = 0 1/ n 3 = ( ) ( ) 15 3 ln 2 1 ( ) ( ) = RR: reject H 0 if z obs z = 1.96 at significance level α = Conclusion: we cannot reject H 0 since z obs = < 1.96.

19 Exploratory Data Analysis (EDA) Coefficient of Correlation Confidence Interval for ρ Based on Fisher s Z transform, (1 α) 100% confidence interval for ρ can be constructed in two steps: First construct (1 α) 100% confidence interval for ρ = 1 2 ln 1 + ρ 1 ρ. Denote it as (L, U ), i.e., (L, U ) := r ± z 1 α/2 / n 3. Transform back to a (1 α) 100% confidence interval for ρ: ( exp(2l ) 1 (L, U) := exp(2l ) + 1, exp(2u ) ) 1 exp(2u, ) + 1 where the hyperbolic tangent function r = exp(2r ) 1 exp(2r ) + 1 = tanh(r ) is the inverse function for Fisher s z transform.

20 Exploratory Data Analysis (EDA) Coefficient of Correlation Example: CI for ρ First, a 95% CI for ρ in the infant birthweight vs. smoking example is ( ) ln 2 1 ( ) ± 1.96/ 15 3 = ( , ). Transform to a 95% CI for ρ: [ ] exp{2 ( )} 1 exp{2 ( )} + 1, exp{ } 1 exp{ } + 1 = ( 0.733, 0.194). With 95% confidence, we conclude that the true correlation between number of cigars smoker and the infant birth weight is between and R Code: cor.test(x, y, alternative = "two.sided", method = "pearson", conf.level=.95)

21 Model Specification SLR: Model Specification Mathematical modeling of relationships among variables: deterministic vs. probabilistic Simple Linear Model (first-order) y i = β 0 + β 1 x i + ε i with ε i IID N (0, σ 2 ), for i = 1,..., n, where E(y i x i ) = β 0 + β 1 x i is the deterministic component; ε is the random error component; {β0, β 1 } are the regression coefficients; σ 2 is the error variance.

22 Model Specification Model Assumptions Four assumptions are involved in the SLR model: (Linearity): The functional relationship between the (conditional) mean response is linear in the predictor, i.e., µ i E(y i x i ) = β 0 + β 1 x i ; (Independence) ε i s are independent of each other; (Homoscedasticity) ε i s have equal variance σ 2 ; (Normality) ε i s are normally distributed. In short, ε i IID N (0, σ 2 ). It follows that y i x i N ( β 0 + β 1 x i, σ 2).

23 Model Specification Illustration of Statistical Assumptions

24 Model Specification Model Interpretation β 0 is the y-intercept, which is the mean response E(Y ) at X = 0. When β0 = 0, the regression line passes through the origin (0, 0). β 1 is the slope of the regression line, which corresponds to the amount of change in the mean response E(Y X ) with every one-unit increase in X. If β1 > 0, positive association; If β 1 > 0, negative association; What is the change in mean response (or expected change in Y ) with an a-unit increase in X? (Answer: a β 1.)

25 Model Estimation LSE of Betas Model Estimation There are infinitely many choices of {β 0, β 1 }, each uniquely defining a line. Want to identify the best. One criterion is the overall distance between observed y i s and their predicted values ŷ i = β 0 + β 1 x i, as measured with squared difference: Q(β 0, β 1 ) = n {y i (β 0 + β 1 x i )} 2. i=1 The least square line is given by ( ˆβ 0, ˆβ 1 ) such that Q( ˆβ 0, ˆβ 1 ) = min β 0,β 1 Q(β 0, β 1 ).

26 Model Estimation LSE of Betas Least Square Estimator { ˆβ 0, ˆβ 1 } are called the least square estimator (LSE) of {β 0, β 1 }. LSE can be uniquely and explicitly determined by solving the first-order necessary condition Q/ β 0 = 0 and Q/ β 1 = 0: (yi ȳ)(x i x) ˆβ 1 = (xi x) 2 = SS xy SS xx ˆβ 0 = ȳ ˆβ 1 x The resultant least square line is given by y = ˆβ 0 + ˆβ 1 x. Accordingly, the fitted value can be computed ŷ i = ˆβ 0 + ˆβ 1 x i for i = 1,..., n.

27 Model Estimation LSE of Betas BWT Example: LSE For the BWT example, we need SS xy = 50.7, SS xx = , ȳ = 7.7, and x = LSE can be computed accordingly: ˆβ 1 = SS xy SS xx = = ˆβ 0 = ȳ ˆβ 1 x = 7.7 ( ) =

28 Model Estimation LSE of Betas Example: Scatterplot with LS Fitting birthweight number of daily cigas

29 Model Estimation LSE of Betas Properties of LSE Both ˆβ 0 and ˆβ 1 are linear combinations of y i s. To see this, first rewrite ˆβ 1 : (xi x)y i ˆβ 1 = (xi x) = ( ) x i x y i = w i y i, 2 SS xx with w i = (x i x)/ss xx. Using the fact ȳ = (1/n)y i, ˆβ 0 is also a linear combination of y i s. Why? It follows (why?) that [ σ ˆβ 2 ] 1 N β 1, SS xx ( 1n ˆβ 0 N [β 0, σ 2 + x 2 )] SS xx

30 Model Estimation Estimation of Error Variance Estimation of σ 2 Let sum of squared errors (SSE) denote the minimized LS criterion SSE = {y i ( ˆβ 0 + ˆβ 1 x i )} 2 = SS yy ˆβ 1 SS xy (for hand computation) It can be shown that SSE/σ 2 χ 2 (n 2). Details can be found in a course on linear model theories. It follows that E(SSE/σ 2 ) = n 2. Hence an unbiased estimator for σ 2 is given by ˆσ 2 = SSE (yi n 2 = ŷ i ) 2 MSE, n 2 where ŷ i = ˆβ 0 ˆβ 1 x i is the fitted value for x i ; MSE is the short form of Mean Square Error.

31 Model Estimation Estimation of Error Variance Example: Estimation of σ 2 First find SSE = SS yy ˆβ 1 SS xy = ( ) ( 50.7) = An estimate of the constant error variance σ 2 is given ˆσ 2 = SSE (n 2) = (15 2) =

32 Model Estimation Statistical Inference Inference on β 1 Now we know ˆβ 1 N [ β 1, σ 2 SS xx ]. However, it involves the unknown parameter σ 2, besides β 1. How can we get rid of σ 2? This can be solved by forming a t random variable by using the following facts: ˆβ σ2 1 β 1 N (0, 1) /SS xx (n 2)ˆσ2 σ 2 χ 2 (n 2). LSE { ˆβ 0, ˆβ 1 } is independent of SSE (let s assume this). Therefore (why?), t = ˆβ 1 β 1 t ˆσ (n 2). 2 /SS xx

33 Model Estimation Statistical Inference Confidence Intervals It follows (why?) that a (1 α) 100% confidence interval (CI) for β 1 : ˆβ 1 ± t (n 2) ˆσ 2 1 α/2, SS xx where SE( ˆβ 1 ) = ˆσ 2 /SS xx is the standard error of ˆβ 1. Following similar arguments, we can obtain a (1 α) 100% confidence interval (CI) for β 0 : ˆβ 0 ± t (n 2) 1 α/2 ( 1 ˆσ 2 n + x 2 ), SS xx ( ) where SE( ˆβ 0 ) = ˆσ 2 1 n + x2 SS xx is the standard error of ˆβ 0.

34 Model Estimation Statistical Inference Example on CI: The BWT Example A 95% CI for β 1 is given by ˆβ 1 ± t (n 2) ˆσ 2 SS xx = ± = ( , ) Interpretation: With 95% confidence coefficient, we estimate that the mean infant birth weight changes by somewhere between and pounds for each additional cigarette smoked by a pregnant woman. Provide a 95% CI for the change in the mean infant birth weight caused by 10 cigarette increases smoked by a pregnant woman. Hint: Ask for 95% CI for 10β 1, which can be obtained as 10 ( , ) = ( 1.078, 0.244).

35 Model Estimation Statistical Inference Example on CI: The BWT Example A 95% CI for β 0 is given by ˆβ 0 ± t (n 2) = ± = (6.9789, ) ˆσ 2 ( 1 n + x 2 SS xx ) ( ) Interpretation: With 95% confidence coefficient, we conclude that the mean infant birth weight of a non-smoking woman ranges from to pounds. A Word of Caution: Since we don t have data at X = 0, we are not certain whether a linear model is appropriate when extending the scope of the model to X = 0.

36 Model Estimation Statistical Inference Two Standard Compute Outputs Table of Parameter Estimates H 0 : β j = 0 Two-Sided Estimate SE t Test P-Value β < β Analysis of Variance Table (ANOVA) Source df SS MS F P-Value Model Error Total

37 Model Estimation Statistical Inference Residual: Worksheet Cigarettes Birth Weight fitted residual ID Per Day (x i ) Weight (y i ) ŷ i r i sum = 0

38 Model Estimation Statistical Inference Residual Plots Histogram Normal Q Q Plot Frequency Sample Quantiles residuals Theoretical Quantiles

39 Model Estimation Statistical Inference Residual Plots (a) residual vs. fitted r r (b) residual vs. x y^ x

40 Model Estimation Statistical Inference Naive Confidence/Prediction Bands Linear Fit with Naive Confidence/Prediction Bands LS fitted line confidence bands prediction bands birthweight # of cigas Note: The critical value used here is t (n 2). This approach suffers from multiplicity. 1 α/2

41 Model Estimation Statistical Inference Working-Hoteling Confidence Bands Working Hoteling Confidence Bands LS fitted line naive confidence bands Hoteling confidence bands birthweight # of cigas Note: The critical value used in the Working-Hoteling confidence band is W = 2 F (2,n 2) 1 α.

42 Model Estimation Statistical Inference Discussion Thanks! Questions?

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511 STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 25 Outline 1 Multiple Linear Regression 2 / 25 Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

STAT2012 Statistical Tests 23 Regression analysis: method of least squares 23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

1. Simple Linear Regression

1. Simple Linear Regression 1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Estadística II Chapter 4: Simple linear regression

Estadística II Chapter 4: Simple linear regression Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23 2.4. ASSESSING THE MODEL 23 2.4.3 Estimatingσ 2 Note that the sums of squares are functions of the conditional random variables Y i = (Y X = x i ). Hence, the sums of squares are random variables as well.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura Fuel Consumption Case: reducing natural gas transmission fines. In 1993, the natural gas industry was deregulated.

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Week 3: Simple Linear Regression

Week 3: Simple Linear Regression Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Linear Regression. 1 Introduction. 2 Least Squares

Linear Regression. 1 Introduction. 2 Least Squares Linear Regression 1 Introduction It is often interesting to study the effect of a variable on a response. In ANOVA, the response is a continuous variable and the variables are discrete / categorical. What

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information