r2, the coefficient of determination The bivariate normal assumption Diagnostic plots: Residuals and Cook's Distance R output (moved to week 3),

Size: px
Start display at page:

Download "r2, the coefficient of determination The bivariate normal assumption Diagnostic plots: Residuals and Cook's Distance R output (moved to week 3),"

Transcription

1 Today's Agenda r2, the coefficient of determination The bivariate normal assumption Diagnostic plots: Residuals and Cook's Distance R output (moved to week 3), Syllabus note: We are ahead of schedule in regression, so we're taking the time to add more examples and details, like Cook's distance and residuals. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 1

2 r2, the coefficient of determination r2 is simply the Pearson correlation coefficient r, but squared. So why all the fuss about it? When x and y are correlated, we say that some of the variation in y is explained by x. The proportion explained is r2. It is called the coefficient of determination because it represents how well a value of y can be determined by x. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 2

3 Abstract case 1: If there was a perfect correlation between x and y, then the relationship between them could be described perfectly by a line. ( r = -1 or +1) Once you have the regression equation, knowing x allows you determine what y is, and without any error. In these cases, r2 is 1, meaning that 100% of the variance in y is explained by x. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 3

4 Abstract case 2: If there was NO correlation between x and y, such that r=0, then there is no linear relationship between x and y. Knowing x and using the regression equation of that (lack of) relationship would tell you literally nothing about y. In these cases, r2 is 0, so none of the variance in y is explained by x. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 4

5 Medical example: On page 4 of 8 of this paper, Pak J Physiol 2010;6(1): there are several scatterplots describing the correlation between resting heart rate (RHR) and several other possibly related variables. Consider the first scatterplot, called Figure 1A. In this figure, a regression of body-mass index (BMI, y) as a function of resting heart rate (RHR, x) is shown. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 5

6 Scatterplot of Heart Rate (x) and Body-Mass Index (y) Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 6

7 Here, the sample correlation is r = 0.305, and there is strong evidence that the population correlation is positive because p < r2 = = , so 9.3% of the variation in BMI can be explained by RHR. Also, 9.3% of the variation in RHR can be explained by BMI. Why? Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 7

8 Correlation works in both directions. In Figure 1b, the sample shows that some variation in Waistto-Hip Ratio (WHR) is explained by (and explains) RHR = or 5.3% of the variation. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 8

9 If Body-Mass Index explains 9.3% of the variation of RHR, and Waist-to-Hip Ratio explains 5.3% of the variation, could they together explain = 14.6%? Sadly, no. Since BMI and WHR are measuring very similar things, there is going to be a lot of overlap in the variation that they explain. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 9

10 But what is this 'variation'? Let's dig deeper! Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 10

11 Recall that the regression equation without the error term, α + βx, is called the least squares line. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 11

12 The 'squares' being referred to are the squared errors. Mathematically, it is the line through the data that produces the smaller sum of squared error (SSE), which is where epsilon ε is the error term that we ignored earlier: Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 12

13 The sum of squares error SSE is the amount of variation that is left unexplained by the model. We used squared errors because... - Otherwise negative and positive errors would cancel. - This way, the regression equation will favour creating many small errors instead of one big one.* - In calculus, the derivative of x2 is easy to find. * Also why Pearson correlation is sensitive to extreme values. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 13

14 The error term is in any model we use, even the null model, which is a fancy term for not regressing at all. Or In the null model, every value of y is predicted to be the average of all observed y values. So α is the sample mean of y, y-bar. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 14

15 The total squared difference from the mean of y is called the sum of squares total, or SST SST is the total square length of all the vertical red lines. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 15

16 If we fit a regression line, (most of the) errors become smaller. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 16

17 Most importantly, the squared errors get smaller. The coefficient of determination, r2, is measuring how much smaller the squared errors get. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 17

18 Here, the correlation is very strong ( r is large), and there are barely and errors at all. So SSError would be much smaller than SSTotal, and r2 is also large Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 18

19 2 The relationship between r, SSE, and SST is: SST is the total amount of variation in Y SSE is the amount of variation in Y left unexplained by X. 2 When r is zero, SSE is same as SST 2 When r is one, SSE disappears completely. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 19

20 So we now have two different interpretations of r-squared. 1. The square of the correlation efficient. 2. The proportion of Sum of Squares Total (SST) that is removed from the error term. Interpretation #1 is specific to correlation. Interpretation #2 works for simple regression, but also for AnOVa, multiple regression, general linear models! Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 20

21 R-squared is truly the go-anywhere animal. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 21

22 Bivariate Normality (and some diagnostics) Regression produces a line that minimizes sum squared errors, so a small number of extreme values (outliers) can have a strong effect on a model. Consider this Pearson r: Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 22

23 More specifically, regression is sensitive to violations of the assumption of bivariate normality. The regression model assumes: 1. The distributions of the x and y variables is normal. If you were to take a histogram of all the x values, that histogram should resemble a normal curve. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 23

24 The regression model also assumes: 2. The distribution of y, conditional on x, is normal. If you were to take a histogram of all the error terms, that histogram should ALSO resemble a normal curve. Any observations that produce errors that are too large to be in the curve are potentially influential outliers. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 24

25 In this diagram, the red line is the regression on all 54 points. The blue line is the regression without the 4 red points. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 25

26 These points are near the lower end of x, and have very large error terms associated with them, so they 'pull' the left end of the regression line down. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 26

27 Another word for these errors is residuals, literally the residue, or portion left over from the model. Here is a scatterplot of the residuals over x. A.K.A, a residual plot. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 27

28 The outliers are clearly visible from the residual plot, and from the histogram below. Their values are twice as large as any other observation. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 28

29 One way to measure how much an outlier is affecting the model is to remove that one point and see how much the model changes. We can see a big difference between the blue and red lines above, but that is a comparison by removing 4 points manually. Another, more systematic (and therefore quick, easy, and often more reliable) method is to remove one observation at a time and see how much the model changes. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 29

30 Cook's distance is a regression deletion diagnostic. It works by comparing a model with every observation to one with only the observation in question removed / deleted. The higher Cook's distance is for a value, the more that particular value is influencing the model. If there are one or two values that are having undue leverage on the model, Cook's distance will find them. This is true even if the residual plot fails to find them (which it can if the observation is 'pulling' hard enough) Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 30

31 This is Cook's Distance for all 54 data points. Note that although all 4 problem points have high Cook's distance compared to the rest, two of them are not obvious problems. Cook's distance has a hard time identifying influential observes when there are several. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 31

32 Dealing with outliers is like selecting an acceptable Type I error. There are conventions and guidelines in place, but it is a case-by-case judgement call. One question to ask is does this observation belong in my data set?, when considering things other than your model. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 32

33 If the outlier is the result of a typo, it's not the same as the rest of your sample and it should go. If other information about that observation is nonsense, such as joke answers in a survey, then that's also justification to remove that outlier observation. If it just happens to be an extreme value, but otherwise everything seems fine with it, then it is best to keep it. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 33

34 Don't rush to finish your model. Look for outliers first. Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 34

35 Next Tuesday: - Diagnostics and Regression in R. - Correlation vs Causality Read: Rubin on Causality, only Sections 1-3 for next Tuesday. Sources: xkcd.com/605 My Hobby: Extrapolating. Sand Crab Photo, by Regiane Cardillo, Brasil Pak. J. Phisol. (2010) 6:1 Mandarin Duck and Parrot on Tortoise unknown Stat 302, Winter 2016 SFU, Week 2, Hour 3, Page 35

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS. Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS. Last time, we looked at scatterplots, which show the interaction between two variables,

More information

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation Chapter 1 Summarizing Bivariate Data Linear Regression and Correlation This chapter introduces an important method for making inferences about a linear correlation (or relationship) between two variables,

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

What is a Hypothesis?

What is a Hypothesis? What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:

More information

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable Chapter 08: Linear Regression There are lots of ways to model the relationships between variables. It is important that you not think that what we do is the way. There are many paths to the summit We are

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

SIMPLE LINEAR REGRESSION STAT 251

SIMPLE LINEAR REGRESSION STAT 251 1 SIMPLE LINEAR REGRESSION STAT 251 OUTLINE Relationships in Data The Beginning Scatterplots Correlation The Least Squares Line Cautions Association vs. Causation Extrapolation Outliers Inference: Simple

More information

Scatterplots and Correlation

Scatterplots and Correlation Bivariate Data Page 1 Scatterplots and Correlation Essential Question: What is the correlation coefficient and what does it tell you? Most statistical studies examine data on more than one variable. Fortunately,

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

AP Statistics Two-Variable Data Analysis

AP Statistics Two-Variable Data Analysis AP Statistics Two-Variable Data Analysis Key Ideas Scatterplots Lines of Best Fit The Correlation Coefficient Least Squares Regression Line Coefficient of Determination Residuals Outliers and Influential

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Notes 6. Basic Stats Procedures part II

Notes 6. Basic Stats Procedures part II Statistics 5106, Fall 2007 Notes 6 Basic Stats Procedures part II Testing for Correlation between Two Variables You have probably all heard about correlation. When two variables are correlated, they are

More information

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics LT1: Basics of Correlation LT2: Measuring Correlation and Line of best fit by eye Univariate (one variable) Displays Frequency tables Bar graphs

More information

8. Example: Predicting University of New Mexico Enrollment

8. Example: Predicting University of New Mexico Enrollment 8. Example: Predicting University of New Mexico Enrollment year (1=1961) 6 7 8 9 10 6000 10000 14000 0 5 10 15 20 25 30 6 7 8 9 10 unem (unemployment rate) hgrad (highschool graduates) 10000 14000 18000

More information

Chapter 7. Scatterplots, Association, and Correlation

Chapter 7. Scatterplots, Association, and Correlation Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Chapter 8. Linear Regression /71

Chapter 8. Linear Regression /71 Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

y n 1 ( x i x )( y y i n 1 i y 2

y n 1 ( x i x )( y y i n 1 i y 2 STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Chapter 5 Friday, May 21st

Chapter 5 Friday, May 21st Chapter 5 Friday, May 21 st Overview In this Chapter we will see three different methods we can use to describe a relationship between two quantitative variables. These methods are: Scatterplot Correlation

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Simple Linear Regression Using Ordinary Least Squares

Simple Linear Regression Using Ordinary Least Squares Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression

More information

Is a measure of the strength and direction of a linear relationship

Is a measure of the strength and direction of a linear relationship More statistics: Correlation and Regression Coefficients Elie Gurarie Biol 799 - Lecture 2 January 2, 2017 January 2, 2017 Correlation (r) Is a measure of the strength and direction of a linear relationship

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1

More information

Stat 101: Lecture 6. Summer 2006

Stat 101: Lecture 6. Summer 2006 Stat 101: Lecture 6 Summer 2006 Outline Review and Questions Example for regression Transformations, Extrapolations, and Residual Review Mathematical model for regression Each point (X i, Y i ) in the

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

STAT22200 Spring 2014 Chapter 8A

STAT22200 Spring 2014 Chapter 8A STAT22200 Spring 2014 Chapter 8A Yibi Huang May 13, 2014 81-86 Two-Way Factorial Designs Chapter 8A - 1 Problem 81 Sprouting Barley (p166 in Oehlert) Brewer s malt is produced from germinating barley,

More information

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate Announcements Announcements Lecture : Simple Linear Regression Statistics 1 Mine Çetinkaya-Rundel March 29, 2 Midterm 2 - same regrade request policy: On a separate sheet write up your request, describing

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory Announcements Announcements Lecture : Relationship between Measurement Variables Statistics Colin Rundel February, 20 In class Quiz #2 at the end of class Midterm #1 on Friday, in class review Wednesday

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent

More information

Multiple Regression and Regression Model Adequacy

Multiple Regression and Regression Model Adequacy Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Simple Linear Regression for the Advertising Data

Simple Linear Regression for the Advertising Data Revenue 0 10 20 30 40 50 5 10 15 20 25 Pages of Advertising Simple Linear Regression for the Advertising Data What do we do with the data? y i = Revenue of i th Issue x i = Pages of Advertisement in i

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics

More information

Regression Analysis II

Regression Analysis II Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to: STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5 1 MATH 16A LECTURE. DECEMBER 9, 2008. PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. I HOPE YOU ALL WILL MISS IT AS MUCH AS I DO. SO WHAT I WAS PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH

More information

Statistics and Quantitative Analysis U4320

Statistics and Quantitative Analysis U4320 Statistics and Quantitative Analysis U3 Lecture 13: Explaining Variation Prof. Sharyn O Halloran Explaining Variation: Adjusted R (cont) Definition of Adjusted R So we'd like a measure like R, but one

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM 1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

Two-Way Factorial Designs

Two-Way Factorial Designs 81-86 Two-Way Factorial Designs Yibi Huang 81-86 Two-Way Factorial Designs Chapter 8A - 1 Problem 81 Sprouting Barley (p166 in Oehlert) Brewer s malt is produced from germinating barley, so brewers like

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Section 4: Multiple Linear Regression

Section 4: Multiple Linear Regression Section 4: Multiple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 The Multiple Regression

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

Parametric Estimating Nonlinear Regression

Parametric Estimating Nonlinear Regression Parametric Estimating Nonlinear Regression The term nonlinear regression, in the context of this job aid, is used to describe the application of linear regression in fitting nonlinear patterns in the data.

More information