Inferences for Regression

Similar documents
appstats27.notebook April 06, 2017

Chapter 27 Summary Inferences for Regression

Warm-up Using the given data Create a scatterplot Find the regression line

Correlation Analysis

Statistics for Managers using Microsoft Excel 6 th Edition

Basic Business Statistics 6 th Edition

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

The Multiple Regression Model

Inference for Regression Simple Linear Regression

Chapter 4: Regression Models

Chapter 16. Simple Linear Regression and dcorrelation

Inference for Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Chapter 4. Regression Models. Learning Objectives

Chapter 16. Simple Linear Regression and Correlation

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Inference for Regression Inference about the Regression Model and Using the Regression Line

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Basic Business Statistics, 10/e

Measuring the fit of the model - SSR

Lectures on Simple Linear Regression Stat 431, Summer 2012

Ordinary Least Squares Regression Explained: Vartanian

Chapter 14 Simple Linear Regression (A)

INFERENCE FOR REGRESSION

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Ch 2: Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Mathematics for Economics MA course

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

9. Linear Regression and Correlation

CHAPTER EIGHT Linear Regression

Formal Statement of Simple Linear Regression Model

Ordinary Least Squares Regression Explained: Vartanian

Intro to Linear Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Simple Linear Regression Using Ordinary Least Squares

Intro to Linear Regression

Simple Linear Regression

Correlation & Simple Regression

AMS 7 Correlation and Regression Lecture 8

Lecture 10 Multiple Linear Regression

Section 3: Simple Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

28. SIMPLE LINEAR REGRESSION III

Unit 6 - Introduction to linear regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Inference for the Regression Coefficient

Chapter 7 Student Lecture Notes 7-1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Can you tell the relationship between students SAT scores and their college grades?

A discussion on multiple regression models

Single and multiple linear regression analysis

Sociology 6Z03 Review II

Unit 6 - Simple linear regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

STA121: Applied Regression Analysis

Chapter 14 Student Lecture Notes 14-1

Inference with Simple Regression

Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Simple Linear Regression

Lecture 3: Inference in SLR

Simple linear regression

Business Statistics. Lecture 9: Simple Regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

1 Multiple Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Econometrics. 4) Statistical inference

Unit 27 One-Way Analysis of Variance

Chapter 3 Multiple Regression Complete Example

Business Statistics. Lecture 10: Correlation and Linear Regression

Regression Analysis II

Chapter 12 - Lecture 2 Inferences about regression coefficient

Multiple Linear Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

STAT Chapter 11: Regression

Scatter plot of data from the study. Linear Regression

Midterm 2 - Solutions

MATH 644: Regression Analysis Methods

Unit 10: Simple Linear Regression and Correlation

What is a Hypothesis?

The simple linear regression model discussed in Chapter 13 was written as

Six Sigma Black Belt Study Guides

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Scatter plot of data from the study. Linear Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

df=degrees of freedom = n - 1

Estadística II Chapter 4: Simple linear regression

Transcription:

Inferences for Regression

An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set:

Remembering Regression In regression, we want to model the relationship between two quantitative variables, one the predictor and the other the response. To do that, we imagine an idealized regression line, which assumes that the means of the distributions of the response variable fall along the line even though individual values are scattered around it. Now we d like to know what the regression model can tell us beyond the individuals in the study. We want to make confidence intervals and test hypotheses about the slope of the regression line.

The Population and the Sample When we found a confidence interval for a mean, we could imagine a single, true underlying value for the mean. When we tested whether two means were equal, we imagined a true underlying difference. What does it mean to do inference for regression?

The Population and the Sample We know better than to think that even if we knew every population value, the data would line up perfectly on a straight line. In our sample, there s a whole distribution of %body fat for men with 38-inch waists:

The Population and the Sample This is true at each waist size. We could depict the distribution of %body fat at different waist sizes like this:

The Population and the Sample The model assumes that the means of the distributions of %body fat for each waist size fall along the line even though the individuals are scattered around it. The model is not a perfect description of how the variables are associated, but it may be useful. If we had all the values in the population, we could find the slope and intercept of the idealized regression line explicitly by using least squares.

Simple Linear Regression Model Dependent Variable Population Y intercept Population Slope Coefficient Independent Variable Random Error term Y i β 0 β X 1 i ε i Linear component Random Error component

Simple Linear Regression Model Y Y i β 0 β X 1 i ε i Observed Value of Y for X i Predicted Value of Y for X i ε i Random Error for this X i value Slope = β 1 Intercept = β 0 X i X

Simple Linear Regression Equation (Prediction Line) The simple linear regression equation provides an estimate of the population regression line Estimated (or predicted) Y value for observation i Estimate of the regression intercept Estimate of the regression slope Ŷ i b 0 b 1 X i Value of X for observation i

Assumptions and Conditions 1. Linearity Assumption: Straight Enough Condition: Check the scatterplot the shape must be linear or we can t use regression at all.

Assumptions and Conditions 1. Linearity Assumption: If the scatterplot is straight enough, we can go on to some assumptions about the errors. If not, stop here, or consider re-expressing the data to make the scatterplot more nearly linear. Check the Quantitative Data Condition. The data must be quantitative for this to make sense.

Assumptions and Conditions 2. Independence Assumption: Randomization Condition: the individuals are a representative sample from the population. Check the residual plot the residuals should appear to be randomly scattered.

Assumptions and Conditions 3. Equal Variance Assumption: Does The Plot Thicken? Condition: Residual plot to check the assumption about error terms have common variance Homoscedasticity: same variance; Heteroscedasticity: different variance.

Assumptions and Conditions 4. Normal Population Assumption: Nearly Normal Condition Normal probability plot to check the assumption about error terms have normal distribution Outlier Condition: Check for outliers.

Assumptions and Conditions If all four assumptions are true, the idealized regression model would look like this: At each value of x there is a distribution of y-values that follows a Normal model, and each of these Normal models is centered on the line and has the same standard deviation.

Intuition About Regression Inference We expect any sample to produce a b 1 whose expected value is the true slope, β 1. What about its standard deviation? What aspects of the data affect how much the slope varies from sample to sample?

Intuition About Regression Inference Spread around the line: Less scatter around the line means the slope will be more consistent from sample to sample. The spread around the line is measured with the residual standard deviation s e. You can always find s e in the regression output, often just labeled s.

Intuition About Regression Inference Spread around the line: Less scatter around the line means the slope will be more consistent from sample to sample.

Intuition About Regression Inference Spread of the x s: A large standard deviation of x provides a more stable regression.

Intuition About Regression Inference Sample size: Having a larger sample size, n, gives more consistent estimates.

Standard Error for the Slope Three aspects of the scatterplot affect the standard error of the regression slope: spread around the line, s e spread of x values, s x sample size, n. The formula for the standard error (which you will probably never have to calculate by hand) is: SE b 1 s e n1 s x

Sampling Distribution for Regression Slopes When the conditions are met, the standardized estimated regression slope t 1 1 follows a Student s t-model with n 2 degrees of freedom. b SE b 1

Sampling Distribution for Regression Slopes We estimate the standard error with se SE b1 n1 sx where: s e y yˆ 2 n 2 n is the number of data values s x is the ordinary standard deviation of the x-values.

Regression Inference A null hypothesis of a zero slope questions the entire claim of a linear relationship between the two variables often just what we want to know. To test H 0 : β 1 = 0, we find t n2 b 0 1 SEb 1 and continue as we would with any other t-test. The formula for a confidence interval for β 1 is 1 n2 1 b t SE b

Textbook Body Fat Example

Textbook Body Fat Example: Using Excel Data Analysis Function 1. Choose Data 2. Choose Data Analysis 3. Choose Regression

Body Fat(%) Textbook Body Fat Example 40 Scatter Plot of Weight and Body Fat (%) 35 30 25 20 15 10 5 0 0 50 100 150 200 250 300 Weight

Textbook Body Fat Example: Using Excel Data Analysis Function Enter Y s and X s and desired options

Textbook Body Fat Example: Excel Output The regression equation is: Body Fat(%) -27.376 0.2499 (weight)

Textbook Body Fat Example: Interpretation of b o Body Fat(%) -27.376 0.2499 (weight) b 0 (-27.376) is the estimated average value of body fat(%) when the value of weight(lb) is zero (if weight = 0 is in the range of observed X values) Because we can t have a weight of 0, b 0 has no practical application

Textbook Body Fat Example: Interpreting b 1 Body Fat(%) -27.376 0.2499 (weight) b 1 (0.2499) estimates the change in the average value of body fat(%) as a result of a one-unit increase in weight(lb) Here, b 1 = 0.2499 tells us that the mean value of body fat(%) increases by 0.2499, on average, for each additional one pound increase in weight

Inferences About the Slope: t Test Example From Excel output: H 0 : β 1 = 0 H 1 : β 1 0 Coefficients Standard Error t Stat P-value Intercept -27.37626233 11.54742832-2.37077 0.029119 X Variable 1 0.249874137 0.060653997 4.119665 0.000643 b 1 S b1 t STAT b1 β S b 1 1 0.249874 0 0.060654 4.1197

Inferences About the Slope: t Test Example H 0 : β 1 = 0 H 1 : β 1 0 From Excel output: Coefficients Standard Error t Stat P-value Intercept -27.37626233 11.54742832-2.37077 0.029119 X Variable 1 0.249874137 0.060653997 4.119665 0.000643 Decision: Reject H 0, since p-value < α=0.05 p-value There is sufficient evidence that weight is related to % Body Fat.

Study: Amygdala volume and social network size in humans

Study: Amygdala volume and social network size in humans

F Test for Overall Significance F Test statistic: F STAT MSR MSE MSR MSE SSR k SSE n k 1 s e y yˆ 2 n 2 where F STAT follows an F distribution with k numerator and (n k - 1) denominator degrees of freedom (k = the number of independent variables in the regression model)

F-Test for Overall Significance s e y yˆ 2 n 2 Regression Statistics Multiple R 0.69663276 R Square 0.485297203 Adjusted R Square 0.456702603 Standard Error 7.049132279 Observations 20 ANOVA F STAT MSR MSE 843.3252 / 49.69027 16.97164 df SS MS F Significance F Regression 1 843.325214 843.3252 16.97164 0.000643448 Residual 18 894.424786 49.69027 Total 19 1737.75 With 1 and 18 degrees of freedom p-value for the F-Test