Warm-up Using the given data Create a scatterplot Find the regression line

Similar documents
Chapter 27 Summary Inferences for Regression

appstats27.notebook April 06, 2017

Inferences for Regression

INFERENCE FOR REGRESSION

Chapter 5 Friday, May 21st

Chapter 7 Linear Regression

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Three percent of a man s body is essential fat. (For a woman, the percentage

What is the easiest way to lose points when making a scatterplot?

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

appstats8.notebook October 11, 2016

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

23. Inference for regression

Lecture 18: Simple Linear Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 3: Describing Relationships

Unit 6 - Introduction to linear regression

Unit 6 - Simple linear regression

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

28. SIMPLE LINEAR REGRESSION III

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Probability Distributions

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Chapter 3: Describing Relationships

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

AMS 7 Correlation and Regression Lecture 8

Conditions for Regression Inference:

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Correlation and Regression

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Correlation & Simple Regression

Business Statistics. Lecture 9: Simple Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

9. Linear Regression and Correlation

7.0 Lesson Plan. Regression. Residuals

LECTURE 15: SIMPLE LINEAR REGRESSION I

Chapter 9. Correlation and Regression

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Chapter 5 Least Squares Regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Chapter 10 Correlation and Regression

MULTIPLE REGRESSION METHODS

Business Statistics. Lecture 10: Course Review

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Ordinary Least Squares Regression Explained: Vartanian

This document contains 3 sets of practice problems.

Linear Regression & Correlation

Chapter 2: Looking at Data Relationships (Part 3)

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

y n 1 ( x i x )( y y i n 1 i y 2

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Six Sigma Black Belt Study Guides

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Correlation and regression

27. SIMPLE LINEAR REGRESSION II

SMAM 314 Exam 42 Name

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Simple Linear Regression: A Model for the Mean. Chap 7

Section 3: Simple Linear Regression

Fish act Water temp

1 Correlation and Inference from Regression

Chapter 16. Simple Linear Regression and Correlation

AP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

1 Introduction to Minitab

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Simple Linear Regression for the Climate Data

Chapter 16. Simple Linear Regression and dcorrelation

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

MATH 1150 Chapter 2 Notation and Terminology

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Lecture 11: Simple Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Confidence intervals

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Examination paper for TMA4255 Applied statistics

Lecture 15: Chapter 10

UNIT 12 ~ More About Regression

Basic Business Statistics 6 th Edition

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Harvard University. Rigorous Research in Engineering Education

Lecture 4 Scatterplots, Association, and Correlation

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Inference for Regression Simple Linear Regression

Chapter 7. Scatterplots, Association, and Correlation

Important note: Transcripts are not substitutes for textbook assignments. 1

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

11 Correlation and Regression

Ch 13 & 14 - Regression Analysis

Transcription:

Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444 43.7 408 Warm-up Using the given data Create a scatterplot Find the regression line Unit Overview Talk about tests Linear Regression inference Slide 27-1

Objectives Content Objective: I will use the linear regression analysis to create confidence intervals and perform hypothesis testing. Social Objective: I will participate in class activities. Language Objective: I will clearly write down formulas, conditions & assumptions, and other notes. Slide 27-2

Warm-up Create a scatterplot Find the regression line Time at the lunch table Slide 27-3 Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444 43.7 408

Slide 27-4

Tests Slide 27-5

How confident are you about the relationship? Confidence intervals about the slope Hypothesis tests about the slope Slide 27-6

But first Assumptions and Conditions In Chapter 8 when we fit lines to data, we needed to check only the Straight Enough Condition. Now, when we want to make inferences about the coefficients of the line, we ll have to make more assumptions (and thus check more conditions). We need to be careful about the order in which we check conditions. If an initial assumption is not true, it makes no sense to check the later ones. Slide 27-7

The data must be quantitative for this to make sense.

Slide 27-9

Randomization Condition Check the residual plot (part 1) the residuals should appear to be randomly scattered. Slide 27-10

Check the residual plot again - the spread of the residuals should be uniform. Slide 27-11

Nearly Normal Condition Check a histogram of the residuals. The distribution of the residuals should be unimodal and symmetric. Slide 27-12

Slide 27-13

We need to finish our lunch Confidence Interval Slide 27-14

We need to finish our lunch Hypothesis test Slide 27-15

MINITAB output The dataset "Healthy Breakfast" contains, among other variables, the Consumer Reports ratings of 77 cereals and the number of grams of sugar contained in each serving. (Data source: Free publication available in many grocery stores. Dataset available through the Statlib Data and Story Library (DASL).) Under the equation for the regression line, the output provides the least-squares estimate for the constant b 0 and the slope b 1. Since b 1 is the coefficient of the explanatory variable "Sugars," it is listed under that name. The calculated standard deviations for the intercept and slope are provided in the second column. We are comparing sugars and calories in each cereal. Slide 27-16

MINITAB output Predictor Coef StDev T P Constant 80.81 56.04 1.44 0.187 Calories 2.4715 0.4072 6.07 0.000 S = 4.15116 R-Sq = 6.8% R-Sq(adj) = 0.0% Slide 27-17

Homework Read chapter 27 take notes on formulas Slide 27-18

An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Slide 27-19

The Population and the Sample When we found a confidence interval for a mean, we could imagine a single, true underlying value for the mean. When we tested whether two means or two proportions were equal, we imagined a true underlying difference. What does it mean to do inference for regression? Slide 27-20

The Population and the Sample (cont.) We know better than to think that even if we knew every population value, the data would line up perfectly on a straight line. In our sample, there s a whole distribution of %body fat for men with 38-inch waists: Slide 27-21

The Population and the Sample (cont.) This is true at each waist size. We could depict the distribution of %body fat at different waist sizes like this: Slide 27-22

The Population and the Sample (cont.) The model assumes that the means of the distributions of %body fat for each waist size fall along the line even though the individuals are scattered around it. The model is not a perfect description of how the variables are associated, but it may be useful. If we had all the values in the population, we could find the slope and intercept of the idealized regression line explicitly by using least squares. Slide 27-23

The Population and the Sample (cont.) We write the idealized line with Greek letters and consider the coefficients to be parameters: 0 is the intercept and 1 is the slope. Corresponding to our fitted line of, we write ˆy b bx 0 1 Now, not all the individual y s are at these means some lie above the line and some below. Like all models, there are errors. y 0 1 x Slide 27-24

The Population and the Sample (cont.) Denote the errors by. These errors are random, of course, and can be positive or negative. When we add error to the model, we can talk about individual y s instead of means: This equation is now true for each data point (since there is an to soak up the deviation) and gives a value of y for each x. y x 0 1 Slide 27-25

Assumptions and Conditions (cont.) If all four assumptions are true, the idealized regression model would look like this: At each value of x there is a distribution of y-values that follows a Normal model, and each of these Normal models is centered on the line and has the same standard deviation. Slide 27-26

Which Come First: the Conditions or the Residuals? There s a catch in regression the best way to check many of the conditions is with the residuals, but we get the residuals only after we compute the regression model. To compute the regression model, however, we should check the conditions. So we work in this order: 1. Make a scatterplot of the data to check the Straight Enough Condition. (If the relationship isn t straight, try re-expressing the data. Or stop.) Slide 27-27

Which Come First: the Conditions or the Residuals? (cont.) 2. If the data are straight enough, fit a regression model and find the residuals, e, and predicted values,. 3. Make a scatterplot of the residuals against x or the predicted values. ˆy This plot should have no pattern. Check in particular for any bend, any thickening (or thinning), or any outliers. 4. If the data are measured over time, plot the residuals against time to check for evidence of patterns that might suggest they are not independent. Slide 27-28

Which Come First: the Conditions or the Residuals? (cont.) 5. If the scatterplots look OK, then make a histogram and Normal probability plot of the residuals to check the Nearly Normal Condition. 6. If all the conditions seem to be satisfied, go ahead with inference. Slide 27-29

Intuition About Regression Inference We expect any sample to produce a b 1 whose expected value is the true slope, 1. What about its standard deviation? What aspects of the data affect how much the slope and intercept vary from sample to sample? Slide 27-30

Intuition About Regression Inference (cont.) Spread around the line: Less scatter around the line means the slope will be more consistent from sample to sample. The spread around the line is measured with the residual standard deviation s e. You can always find s e in the regression output, often just labeled s. Slide 27-31

Intuition About Regression Inference (cont.) Spread around the line: Less scatter around the line means the slope will be more consistent from sample to sample. Slide 27-32

Intuition About Regression Inference (cont.) Spread of the x s: A large standard deviation of x provides a more stable regression. Slide 27-33

Intuition About Regression Inference (cont.) Sample size: Having a larger sample size, n, gives more consistent estimates. Slide 27-34

Standard Error for the Slope Three aspects of the scatterplot affect the standard error of the regression slope: spread around the line, s e spread of x values, s x sample size, n. The formula for the standard error (which you will probably never have to calculate by hand) is: SE b 1 n s e 1 s x Slide 27-35

Sampling Distribution for Regression Slopes When the conditions are met, the standardized estimated regression slope t b 1 1 follows a Student s t-model with n 21degrees of freedom. SE b Slide 27-36

Sampling Distribution for Regression Slopes (cont.) We estimate the standard error with where: SE b 1 s e n 1 s x s e y yˆ 2 n 2 n is the number of data values s x is the ordinary standard deviation of the x-values. Slide 27-37

What About the Intercept? The same reasoning applies for the intercept. We can write b 0 0 SE(b 0 ) : t n 2 but we rarely use this fact for anything. The intercept usually isn t interesting. Most hypothesis tests and confidence intervals for regression are about the slope. Slide 27-38

Regression Inference A null hypothesis of a zero slope questions the entire claim of a linear relationship between the two variables often just what we want to know. To test H 0 : 1 = 0, we find t n 2 b 0 1 SE b 1 and continue as we would with any other t-test. The formula for a confidence interval for 1 is 1 n 2 1 b t SE b Slide 27-39

*Standard Errors for Predicted Values Once we have a useful regression, how can we indulge our natural desire to predict, without being irresponsible? Now we have standard errors we can use those to construct a confidence interval for the predictions, smudging the results in the right way to report our uncertainty honestly. Slide 27-40

*Standard Errors for Predicted Values (cont.) For our %body fat and waist size example, there are two questions we could ask: Do we want to know the mean %body fat for all men with a waist size of, say, 38 inches? Do we want to estimate the %body fat for a particular man with a 38-inch waist? The predicted %body fat is the same in both questions, but we can predict the mean %body fat for all men whose waist size is 38 inches with a lot more precision than we can predict the %body fat of a particular individual whose waist size happens to be 38 inches. Slide 27-41

*Standard Errors for Predicted Values (cont.) We start with the same prediction in both cases. We are predicting for a new individual, one that was not in the original data set. Call his x-value x ν (38 inches). The regression predicts %body fat as ŷ b b x 0 1 Slide 27-42

*Standard Errors for Predicted Values (cont.) Both intervals take the form yˆ t SE n 2 The SE s will be different for the two questions we have posed. Slide 27-43

*Standard Errors for Predicted Values (cont.) The standard error of the mean predicted value is: 2 2 se SE( v) SE ( b1 ) ( xv x) n Individuals vary more than means, so the standard error for a single predicted value is larger than the standard error for the mean: e SE( y ) SE ( b ) ( x x) s s n 2 2 2 2 v 1 v e 2 Slide 27-44

*Standard Errors for Predicted Values (cont.) Keep in mind the distinction between the two kinds of confidence intervals. The narrower interval is a confidence interval for the predicted mean value at x ν The wider interval is a prediction interval for an individual with that x-value. Slide 27-45

*Confidence Intervals for Predicted Values Here s a look at the difference between predicting for a mean and predicting for an individual. The solid green lines near the regression line show the 95% confidence interval for the mean predicted value, and the dashed red lines show the prediction intervals for individuals. Slide 27-46

What Can Go Wrong? Don t fit a linear regression to data that aren t straight. Watch out for the plot thickening. If the spread in y changes with x, our predictions will be very good for some x-values and very bad for others. Make sure the errors are Normal. Check the histogram and Normal probability plot of the residuals to see if this assumption looks reasonable. Slide 27-47

What Can Go Wrong? (cont.) Watch out for extrapolation. It s always dangerous to predict for x-values that lie far from the center of the data. Watch out for high-influence points and outliers. Watch out for one-tailed tests. Tests of hypotheses about regression coefficients are usually two-tailed, so software packages report two-tailed P-values. If you are using software to conduct a one-tailed test about slope, you ll need to divide the reported P-value in half. Slide 27-48

What have we learned? We have now applied inference to regression models. We ve learned: Under certain assumptions, the sampling distribution for the slope of a regression line can be modeled by a Student s t-model with n 2 degrees of freedom. To check four conditions, in order, to verify the assumptions. Most checks can be made by graphing the data and residuals. Slide 27-49

What have we learned? To use the appropriate t-model to test a hypothesis about the slope. If the slope of the regression line is significantly different from 0, we have strong evidence that there is an association between the two variables. To create and interpret a confidence interval or the true slope. We have been reminded yet again never to mistake the presence of an association for proof of causation. Slide 27-50