Simple linear regression

Size: px
Start display at page:

Download "Simple linear regression"

Transcription

1 Simple linear regression Business Statistics Fall

2 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4. homoscedasticity, normality of errors, and prediction intervals 5. hypothesis tests and confidence intervals for regression parameters 6. the lm() command in R 7. Uses of regression OpenIntro ch. 7, Naked Statistics excerpt. 2

3 Predicting birth weight, given mother s age Consider trying to predict the birth weight of a newborn baby. Does knowing the mother s age help us at all? In the language of random variables, if Y = birth weight and X = mother s age, is Y independent of X? To answer this question, we simply look at the conditional distribution of Y X = x for different ages, x = {18, 19, 20,..., 45}. If the distributions differ by age, that means that Y has a statistical dependence on X. In terms of our very large NBER data set, this means making density plots, disaggregated by mother age. 3

4 Predicting birth weight, given mother s age 18 years old 35 years old Birth Weight in grams So, age definitely matters. But how do we turn these density plots into point predictions? 4

5 Loss criteria: squared error We have seen that making decisions in random settings means trying to do well on average, and that quantifying this requires picking a utility function. In prediction settings, it is common to refer to a loss function, which gauges how far off our prediction was. We seek to make the prediction that gives the minimum average loss. A commonly studied loss function is called squared error loss: (ŷ y) 2, where y is a realization of our random variable Y and ŷ (pronounced y-hat ) is our prediction. 5

6 The mean is the best prediction under squared error The reason squared error (ŷ y) 2 is so widely used is that the optimal prediction takes a particularly convenient form. To minimize our average squared error loss, we just set ŷ E(Y ). In other words, if we think this loss function is reasonable, we only need to worry about finding the mean of Y! 6

7 The variance then measures predictability If we go back to the definition of variance and squint hard, we see that it can be interpreted as the average squared error, when using the mean as our prediction. V(Y ) = y Pr(Y = y)(ŷ y) 2 = y p(y)(e(y ) y) 2. This means that if I had to predict infant birth weight, I would predict E(Y ) = 3367 grams. Furthermore, this prediction will give me average, or mean, squared error equal to V(Y ) = This seems too large, because it is in grams squared. For this reason we often talk about root mean squared error, which is V(Y ). In this case that is grams. 7

8 Conditional prediction So what happened to X? Didn t mother s age matter? Indeed, it did. If we did not know the mother s age, we would go with E(Y ) as our birthweight prediction. If we do know the mother s age, then we should predict with the conditional mean E(Y X = x). How well we are able to predict at each age is then measured by the conditional variance V(Y X ). We can figure out our overall prediction error by taking a weighted average of the variance at each age. 8

9 Birthweight vs age: E(Y X = x) Weight in grams Mother age What s going on at the higher age range? 9

10 Birthweight vs age: E(Y X = x) ± 2σ x Weight in grams Mother age Why does σ x have an x subscript? 10

11 NBA height and weight: E(Y X = x) Weight in lbs Height in inches A few heights have only one observation. Is that problematic? 11

12 Square feet and house price Price in dollars Square Feet Many data sets have no shared values of predictor variables. What then? 12

13 Square feet and house price: E(Y X = x) Price in dollars Square Feet We can always bin the observations and compute the expected value within each bin. We call this discretizing a continuous covariate. 13

14 Linear prediction For every unique value of our predictor variable X = x, we can try to determine E(Y X = x) and this is the best prediction we can make according to mean squared error. Sometimes, it is more convenient to have a simple rule that takes a value of x and produces a prediction ŷ x directly, without worrying if that guess is the best possible. A particularly simple and popular type of rule is a linear prediction rule: ŷ x a + xb for two numbers a and b. 14

15 Lines Remember that a is called the intercept of the line and b is called the slope. Slope is often thought of as rise over run. It is the number of units up the line moves for every unit of x we move to the right. y x How do we decide what slope and intercept to use for prediction? 15

16 Linear prediction It turns out that the optimal linear prediction rule: ŷ x a + xb according to mean squared error can be found using the following formulas: b cor(x, Y ) σ Y σ X and a E(Y ) be(x ). 16

17 Least squares Applying this rule do a dataset gives us the least squares line. Least squares refers to the fact that we are using the mean squared error as our criterion. Applied to a data set (an empirical distribution), mean squared prediction error looks like this: E([Ŷ Y ]2 ) = 1 n n (ŷ i y i ) 2. i=1 For a linear prediction rule with intercept a and slope b this can be written 1 n (a + x i b y i ) 2. n i=1 Finding the best linear predictor based on observed data is called linear regression. 17

18 Mother s age versus birthweight The least squares line-of-best-fit for this subset of the birthweight data is a = 3177 and b = Weight in grams Mother age What are the units of a? How about the units of b? 18

19 NBA height versus weight The least squares line-of-best-fit for the NBA data is a = and b = Weight in lbs Height in inches What would you predict for a player who is 6 foot tall? 19

20 Square feet versus sale price The least squares line-of-best-fit for the housing data is a = and b = Price in dollars Square Feet What would a house with zero square feet cost according to this prediction rule? 20

21 Signal + noise Prediction rules can help us think about individual observations in terms of a trend component and an error component, a signal and some noise. The trend component is just the prediction part, ŷ i. The error component is whatever is left over e i y i ŷ i. In the case of a least-squares line with intercept a and slope b our observed errors, or residuals, are just e i y i (a + bx i ). In practical settings, what might a large residual suggest about a particular observation? For the birthweight data? For the NBA data? For the housing data? 21

22 Investigating big residuals Residuals allow us to define outliers, conditional on an observed predictor value. Price in dollars Square Feet Here, we find a house that is approximately $55K above the price trend for a house of its size. It turns out to be a four bedroom brick house in a nice neighborhood. 22

23 Investigating big residuals Weight in lbs Height in inches Turns out Shaq was a big fella, even for someone his height. 23

24 Sum of squared residuals Using the notion of residuals, we can think about our mean squared error (in-sample) as simply 1 ei 2. n i It would be nice if the residuals were all small, but what counts as small depends on the units of X and Y. To cook up a unit-less measure of small, consider this fact: (y i ȳ) 2 = i i (a + bx i ȳ) 2 + i e 2 i. The left-hand-side is the (sample) variance of Y. The first term on the right is the portion of the variance due to the trend line. The second term is the portion of the variance due to the residuals. 24

25 Decomposing the variance (y i ȳ) 2 = i i (a + bx i ȳ) 2 + i e 2 i. Price in dollars Square Feet The variation about the mean breaks down as the part due to the trend line (ORANGE) and the part due to the residuals (BLUE). 25

26 R 2 measure of goodness-of-fit Taking this idea further we can think of this equation as (y i ȳ) 2 = (a + bx i ȳ) 2 + ei 2. i i i sum-of-squares-total = sum-of-squares-regression + sum-of-squares-error or SST = SSR + SSE. Accordingly, a unit-less measure of the residuals being small is the ratio R 2 variation explained total variation = SSR SST. 26

27 R 2 is the squared correlation Let s do a little algebra. R 2 = = i (a + bx i ȳ) 2 i (y i ȳ) 2 i (a + bx i a b x) 2 i (y i ȳ) 2 = b2 i (x i x) 2 i (y i ȳ) 2 = b 2 σ2 x σ 2 y = ρ 2. If you define the best linear prediction line first, then this is a really nice way to think about correlation: it s the (square of the) fraction of the variation in one variable that is attributable to its trend in the other variable. 27

28 Homoscedastic, normal prediction model There s nothing that says our residuals have to be homoscedastic, meaning the noise is about the same size for any given predictor value. Likewise, the conditional distribution P(Y X = x) need not be a normal distribution. It is nonetheless common to make these assumptions. To the extent that they are reasonable, we can construct confidence intervals for our predictions using Y X = x N(a + bx, s 2 ), i e2 i where s 2 = 1 n, the sample sum of squared error. Notice that s2 does not depend on x. 28

29 Normal linear prediction density For 18 year old mothers, the normal linear prediction model seems reasonable: density NormLinear density Birth Weight in grams 29

30 Normal linear prediction density For 45 year old mothers, the assumptions appear to break down: density NormLinear density Birth Weight in grams Note, the red curve is based on only 75 data points; however that is enough to tell that the mean is probably off. 30

31 Prediction intervals Based on the prediction model Y X = x N(a + bx, s 2 ), we can construct prediction intervals by taking plus/minus 1, 2, 3 (etc.) standard deviations. For example, a 95% prediction interval for the price of a house of 2,150 square feet is given by (2.15) ± 2(22.48), where our units are thousands of dollars and thousands of square feet. 31

32 Prediction intervals A 99.7% prediction interval for the weight of a 6 8 NBA player is given by (80) ± 3(14.76), where our units are pounds and inches. A 99.7% interval for a five foot player would be (60) ± 3(14.76). Why maybe we don t trust this one as much? 32

33 In-sample versus out-of-sample prediction When we create a least-squares linear predictor from data, we re finding the best linear-predictor for observations sampled at random from our database. When we find the sample correlation, sample standard deviations, and sample means, we are building a linear predictor that is optimal for the empirical distribution. Usually, we are not playing this stylized game of predicting random points from our already-observed data. Rather we want to predict well out-of-sample. 33

34 Sampling variability What we want is the best linear-predictor for the population. We can think of the data-based least squares linear predictor as an estimate of the desired linear predictor. As usual, the hope is that our sample data looks reasonably close to what other such samples would look like. As we get more and more data, our estimate should get closer and closer to the truth (the estimate should get close to the estimand). As usual, the question is, how can we gauge how close we are? 34

35 Sampling variability An intuitive mechanical way to simulate sampling variability is bootstrapping. This means that we sample our observed data (with replacement), to get a pseudo-sample. We then fit the least-squares line to this pseudo-sample and we store the resulting estimate. We then repeat thousands of times. 35

36 Bootstrapped linear predictor Height in inches Weight in pounds 36

37 Sampling distribution for linear predictor coefficients It turns out that the coefficients the intercept and the slope of the least-squares linear predictor are approximately normally distributed. As with other normal distribution-based hypothesis tests and confidence intervals, all that we need are our standard errors the standard deviation of the sampling distribution. Fortunately, most software provides these standard errors for us (and much more). So before we look at the details of doing statistical tests on the parameters of a linear prediction model, let us take a look at what the R output looks like. 37

38 Regression in R To load in the birthweight data: bw <- read.csv(url( ),header=true) To fit the linear regression use the lm() command: bwfit <- lm(birthwt~age,data=bw) Here BirthWt is the response variable or outcome variable and Age is the predictor variable, or covariate, or regressor, or feature. 38

39 Regression in R To see the regression object, simply type its name at the command line > bwfit Call: lm(formula = BirthWt ~ Age, data = bw) Coefficients: (Intercept) Age Why are these fitted values different from the ones on slide 18? 39

40 Regression in R We can extract more information using the summary() command: > summary(bwfit) Call: lm(formula = BirthWt ~ Age, data = bw) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** Age <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1882 on 1 and DF, p-value: < 2.2e-16 40

41 Regression in R We can also call the anova() command, which stands for analysis of variance. > anova(bwfit) Analysis of Variance Table Response: BirthWt Df Sum Sq Mean Sq F value Pr(>F) Age e < 2.2e-16 *** Residuals e Signif. codes: 0 *** ** 0.01 * Let s check that the sum-of-squares regression divided by the sum-of-squares total equals R 2 from the previous slide. 41

42 Regression in R To load in the NBA height/weight data: nba <- read.table(url( ),header=false) To fit the linear regression use the lm() command: nbafit <- lm(weight~height,data=nba) Here weight is the response variable or outcome variable and height is the predictor variable, or covariate, or regressor, or feature. 42

43 Regression in R > summary(nbafit) Call: lm(formula = weight ~ height, data = nba) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** height <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 322 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 322 DF, p-value: < 2.2e-16 43

44 Prediction interval Let s make a prediction interval from this regression output. We said before that a 99.7% prediction interval for the weight of a 6 8 NBA player is given by (80) ± 3(14.76), but where did these numbers come from? Referring back to the previous slide, a = is the intercept, b = 6.51 is the slope coefficient, and s = is the residual standard error. 44

45 Regression in R To load in the housing data: house <- read.table(url( ),header=false) To fit the linear regression use the lm() command: housefit <- lm(price~sqft,data=house) Here Price is the response variable or outcome variable and SqFt is the predictor variable, or covariate, or regressor, or feature. 45

46 Regression in R > summary(housefit) Call: lm(formula = Price ~ SqFt, data = house) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 126 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 55.5 on 1 and 126 DF, p-value: 1.302e-11 46

47 Confidence interval Let s make a confidence interval for the true slope coefficient β from this regression output. For a 95% confidence interval we take the point estimate and then take plus/minus 1.96 standard errors: ± 1.96(9.43) = (51.74 to 88.71). Referring back to the previous slide, b = is the estimated slope coefficient, and se b = 9.43 is the associated standard error. 47

48 Confidence interval What about a confidence interval for the true intercept parameter α? For a 95% confidence interval we take the point estimate of the intercept and then take plus/minus 1.96 standard errors: 10 ± 1.96(18.97) = ( to 27.18). Referring back to the previous slide, a = 10 is the estimated intercept, and se a = is the associated standard error. Can we reject the null hypothesis that α = 0? 48

49 p-values R is set up to automatically consider two-sided hypothesis tests that the parameters of the linear predictor are exactly zero. To this end, they report the associated p-values by computing z-scores. For example, for the intercept α, we find the test statistic For the housing data, this is z = (a 0) se a. z = (a 0) = 10 0 = se a To find the p-value we use 2 pnorm( 0.527) = This differs only slightly from the output from R (due mainly to rounding). 49

50 p-values Now let s do the slope parameter z = (b 0) = = se b R calls these test statistics t statistics because the adjustment to the normal for small sample sizes that it makes yields a t distribution. In any case, seven and a half standard deviations gives a very tiny p-value. Why does the null hypothesis of β = 0 make sense for the slope parameter; in what sense is zero special? 50

51 Applications of regression It is no exaggeration to say that linear regression is used everywhere: credit ratings law inforcement (racial profiling!) recommender systems (Amazon and Netflix) professional sports analytics (Moneyball) hedge fund performance evaluation medical diagnostics and on and on... 51

52 Super Crunchers 52

53 Netflix 53

54 Moneyball 54

55 3 Rules 55

56 Warren Buffet versus John Maynard Keynes A popular financial model is called the market model. Essentially you consider the returns of a particular investor, fund or stock as the response variable in a linear regression with the S&P500 (or another large market index) as the predictor variable. One may then interpret the linear regression coefficients, the intercept α and the slope β, as excess returns and a risk factor. Why do you suppose is meant by these terms? In the next homework, you will look at the investing records of two titans of finance and see which one seems to have the better performance. 56

57 Correlation does not imply causation So what does that mean for us? Think about this: straight teeth are probably associated with the price of the car a person drives. For prediction, this is all well and good. What would you think about a parent who buys her kid a new Lambo because she wants to fix his fugly grill? 57

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Homework 1 Solutions

Homework 1 Solutions Homework 1 Solutions January 18, 2012 Contents 1 Normal Probability Calculations 2 2 Stereo System (SLR) 2 3 Match Histograms 3 4 Match Scatter Plots 4 5 Housing (SLR) 4 6 Shock Absorber (SLR) 5 7 Participation

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Section 4: Multiple Linear Regression

Section 4: Multiple Linear Regression Section 4: Multiple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 The Multiple Regression

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Business Statistics 41000: Homework # 5

Business Statistics 41000: Homework # 5 Business Statistics 41000: Homework # 5 Drew Creal Due date: Beginning of class in week # 10 Remarks: These questions cover Lectures #7, 8, and 9. Question # 1. Condence intervals and plug-in predictive

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Simple Linear Regression Using Ordinary Least Squares

Simple Linear Regression Using Ordinary Least Squares Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics Chapter 13 Student Lecture Notes 13-1 Department of Quantitative Methods & Information Sstems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analsis QMIS 0 Dr. Mohammad

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

Linear Regression Measurement & Evaluation of HCC Systems

Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Today s goal: Evaluate the effect of multiple variables on an outcome variable (regression) Outline: - Basic theory - Simple

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

bivariate correlation bivariate regression multiple regression

bivariate correlation bivariate regression multiple regression bivariate correlation bivariate regression multiple regression Today Bivariate Correlation Pearson product-moment correlation (r) assesses nature and strength of the linear relationship between two continuous

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.1 3.3.2 by Iain Pardoe 3.1 Probability model for (X 1, X 2,...) and Y 2 Multiple linear regression................................................

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000 Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Chapter 8. Linear Regression /71

Chapter 8. Linear Regression /71 Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Final Exam Bus 320 Spring 2000 Russell

Final Exam Bus 320 Spring 2000 Russell Name Final Exam Bus 320 Spring 2000 Russell Do not turn over this page until you are told to do so. You will have 3 hours minutes to complete this exam. The exam has a total of 100 points and is divided

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information