Applied Regression Analysis

Similar documents
Advanced Regression Topics: Violation of Assumptions

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Inferences for Regression

Scatter plot of data from the study. Linear Regression

Simple Linear Regression

Simple Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Regression With a Categorical Independent Variable: Mean Comparisons

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

MORE ON SIMPLE REGRESSION: OVERVIEW

Chapter 16. Simple Linear Regression and dcorrelation

Scatter plot of data from the study. Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Six Sigma Black Belt Study Guides

Lecture 14 Simple Linear Regression

Linear Regression & Correlation

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

determine whether or not this relationship is.

Business Statistics. Lecture 9: Simple Regression

Psychology 282 Lecture #4 Outline Inferences in SLR

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

AMS 7 Correlation and Regression Lecture 8

Ch 2: Simple Linear Regression

Statistical Distribution Assumptions of General Linear Models

Chapter 16. Simple Linear Regression and Correlation

Regression Models - Introduction

Simple linear regression

THE PEARSON CORRELATION COEFFICIENT

Tutorial 6: Linear Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Quantitative Techniques - Lecture 8: Estimation

Basic Business Statistics 6 th Edition

Regression. Marc H. Mehlman University of New Haven

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

y response variable x 1, x 2,, x k -- a set of explanatory variables

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

23. Inference for regression

Business Statistics. Lecture 10: Correlation and Linear Regression

INFERENCE FOR REGRESSION

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Correlation & Simple Regression

1 Correlation and Inference from Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

BIOSTATISTICS NURS 3324

Can you tell the relationship between students SAT scores and their college grades?

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Chapter 4: Regression Models

Mathematics for Economics MA course

Simple Linear Regression

Correlation Analysis

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Chapter 11 Linear Regression

Density Temp vs Ratio. temp

Correlation 1. December 4, HMS, 2017, v1.1

Simple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

28. SIMPLE LINEAR REGRESSION III

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Chapter 4. Regression Models. Learning Objectives

Regression Analysis: Basic Concepts

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

y n 1 ( x i x )( y y i n 1 i y 2

Regression Models - Introduction

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

CS 5014: Research Methods in Computer Science

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Sociology 6Z03 Review II

Chapter 1 Statistical Inference

PS2: Two Variable Statistics

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Section 3: Simple Linear Regression

Unit 6 - Introduction to linear regression

Intro to Linear Regression

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Review of Multiple Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 9. Correlation and Regression

UNIT 12 ~ More About Regression

Warm-up Using the given data Create a scatterplot Find the regression line

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 27 Summary Inferences for Regression

Lecture 3: Inference in SLR

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

The Simple Linear Regression Model

Statistics for Managers using Microsoft Excel 6 th Edition

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Chapter 8: Correlation & Regression

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Correlation and simple linear regression S5

Transcription:

Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46

Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics (introduction). hen X is random. How to do today s topics in. rapping Up Lecture #2-1/27/2005 Slide 2 of 46

Previously on EPSY 581... Recall from our first meeting the hand calculated sample problem with the following scatterplot: Small Example The Basics The Basics Estimation Example Properties rapping Up r XY = 0.845 Correlation shows the degree of association between X and Y......but does so without scale. hat if we wanted to predict Y given a value of X? Lecture #2-1/27/2005 Slide 3 of 46

The Basics Small Example The Basics The Basics Estimation Example Properties rapping Up Assume (for now) X is fixed at pre-determined levels in an experiment - independent variable. For example, we have an experiment where subjects are given X cups of coffee. Jon the Mad Scientist decides that subjects should be randomly assigned to a group drinking either 1, 2, 3, 4,or 5 cups of coffee. The we want to estimate the linear effect of the independent variable X on the dependent variable Y. For our example, we want to see how coffee drinking affects blood pressure. Blood pressure = Y = dependent variable. Lecture #2-1/27/2005 Slide 4 of 46

The Basics Small Example The Basics The Basics Estimation Example Properties The linear regression model (for observation i = 1,...,N): Y i = α + βx i + ǫ i rapping Up Don t be confused by the Greek alphabet, this is simply the equation for a line (y = a + bx). α is the mean of the population when X is zero...the Y intercept. β is the slope of the line, the amount of increase in Y brought about by a unit increase (X = X + 1) in X. ǫ i is the random error, specific to each observation. Lecture #2-1/27/2005 Slide 5 of 46

Parameter Estimates Small Example The Basics The Basics Estimation Example Properties Chapter 2 parameterizes the linear regression model as: Y = a + bx + e To find estimates for a and b consider several possible choices: N So that e is minimized. So that i N e 2 is minimized. i From some guy in the hallway. rapping Up Lecture #2-1/27/2005 Slide 6 of 46

And The inner Is... Small Example The Basics The Basics Estimation Example Properties Finding a and b that minimize: N i e 2 Using calculus, these happen to be: b = (X X)(Y Ȳ ) (X X) 2 = r xy s y s x = xy x 2 And: rapping Up a = Ȳ b X LS estimates are considered BLUE: Best Linear Unbiased Estimators. Lecture #2-1/27/2005 Slide 7 of 46

An Example of Small Example The Basics The Basics Estimation Example Properties Table 2.1 in Pedhazur lists data from an experiment where X was the number of hours given for study, and Y is the score on a test. 12.00 10.00 8.00 Test Score 6.00 rapping Up 4.00 1.00 2.00 3.00 4.00 5.00 Hours of Study Lecture #2-1/27/2005 Slide 8 of 46

Example (continued) Small Example The Basics The Basics Estimation Example Properties e can tell that: (X X) 2 = x 2 = 40 (X X)(Y Ȳ ) = xy = 30 X = 3.0 Ȳ = 7.3 So: b = xy x 2 = 30 40 = 0.75 a = Ȳ b X = 7.3 (0.75 3.0) = 5.05 Given these estimates, the linear regression line is given by: Y = 5.05 + 0.75X rapping Up Lecture #2-1/27/2005 Slide 9 of 46

Example (continued) Small Example The Basics The Basics Estimation Example Properties 12.00 10.00 Test Score = 5.05 + 0.75 * X R Square = 0.17 Test Score 8.00 6.00 4.00 rapping Up 1.00 2.00 3.00 4.00 5.00 Hours of Study Lecture #2-1/27/2005 Slide 10 of 46

Properties of the LS Estimates Notice: Small Example The Basics The Basics Estimation Example Properties Y = a + bx = (Ȳ b X) + bx = This means when X = X, Y = Ȳ. Ȳ + b(x X) Furthermore, this implies that for every simple linear regression, the point ( X, Ȳ ) will always fall on the regression line. Furthermore, if b = 0 (no relation between X and Y ), the best guess (in a LS sense) for a value of Y would be Ȳ. rapping Up Lecture #2-1/27/2005 Slide 11 of 46

(Y Ȳ ) 2 = ss reg + ss res How Good Is Our Line? Model Accuracy Sums of Squares Example Variance Accounted For VAC Example You ve fit the model...you have your estimates...now what? e need to determine how well Y is predicted by X. Specifically, how much variability in Y can be accounted for by the linear regression where X is a predictor? A math adventure: Y = Ȳ + (Y Ȳ ) + (Y Y ) Y Ȳ = (Y Ȳ ) + (Y Y ) (Y Ȳ ) 2 = (Y Ȳ )2 + (Y Y ) 2 rapping Up Lecture #2-1/27/2005 Slide 12 of 46

Sums of Squares Model Accuracy Sums of Squares Example Variance Accounted For VAC Example The Sums of Squares for our dependent variable, Y, is denoted in Pedhazur by y 2. As shown before, in a regression, these can be partitioned into two components: Part due to the regression line: ss reg rapping Up Part due to error: ss res Dividing both components by y 2 gives the proportion of sum of squares due accounted for by the regression and the proportion accounted for by error. Lecture #2-1/27/2005 Slide 13 of 46

Example Computation Model Accuracy Sums of Squares Example Variance Accounted For VAC Example In our studying v. test score example from before, we know that: y 2 = 130.2 b = 0.75 xy= 30 To compute ss reg, one of several formulas can be used: ss reg = ( xy) 2 x 2 = b xy = b 2 x 2 = 0.75 30 = 22.5 rapping Up ss res = y 2 ss reg = 130.2 22.5 = 107.7 Lecture #2-1/27/2005 Slide 14 of 46

Example Continued Dividing both by y 2 gives: Proportion explained by regression: 0.173 Model Accuracy Sums of Squares Example Variance Accounted For VAC Example Proportion explained by error: 0.827 rapping Up Lecture #2-1/27/2005 Slide 15 of 46

Variance Accounted For Model Accuracy Sums of Squares Example Variance Accounted For VAC Example Instead of approaching fit via Sums of Squares, one could look at it as a spinoff of the Pearson correlation: r 2 xy = ( xy) 2 x 2 y 2 ss reg = r 2 xy y 2 rapping Up Lecture #2-1/27/2005 Slide 16 of 46

Variance Accounted For - Example From our studying v. test score example from before, we know that: Model Accuracy Sums of Squares Example Variance Accounted For VAC Example So, r 2 xy = 0.173... r xy = 0.416 hich is the percent of variance accounted for by the regression. Therefore, 1 r 2 xy = 0.827; the percent of variance due to error, or unexplained variance. rapping Up Lecture #2-1/27/2005 Slide 17 of 46

Statistical Significance Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test rapping Up Statistical significance is literally the likelihood of the null hypothesis in a hypothesis test. H 0 : b = 0 H 1 : b 0 Statistical significance is not related to the meaningfulness of the result. Take b = 0.0001. ith se(b) = 0.0000001. b is statistically significant but in reality, there is a negligible relationship between X and Y. Lecture #2-1/27/2005 Slide 18 of 46

Hypothesis Tests in Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test There are several types of hypothesis tests that can be used in simple linear regression. Most of these tests are all related: a test for a non-zero relationship between two variables, X and Y. Today, we have covered three different measures of association: b -ss reg r 2 r xy rapping Up Lecture #2-1/27/2005 Slide 19 of 46

Hypothesis Tests in Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test Each of the following tests evolve statistically from placing distributional assumptions on the disturbance term. Y = a + bx + e Typically, we say that errors follow a normal distribution: e N(0, σ 2 e) By placing distributional assumptions Sampling distributions are defined. Hypothesis tests are enabled. must be recognized and checked for validity. rapping Up Lecture #2-1/27/2005 Slide 20 of 46

Testing the Entire Regression Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test A test of all meaningful regression parameters is as follows: H 0 : b = 0 H 1 : b 0 Because we only have one predictor variable, this is very straight forward. The test statistic is F -distributed, and is computed by: F = ss reg df reg ss res df res = ss reg k ss res N k 1 F-table gives significance for a given Type I error rate α, OR USE THE COMPUTER. rapping Up Lecture #2-1/27/2005 Slide 21 of 46

Testing the Entire Regression - Example From our example, the test of the entire regression is: H 0 : b = 0 H 1 : b 0 Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test rapping Up F = ss reg df reg ss res df res = ss reg k ss res N k 1 = 22.5 1 107.7 18 = 3.76 Typing =fdist(3.76,1,18) into MS Excel gives a p-value of 0.0683. e could infer that there is no statistical evidence to suggest b is different from 0. This means there is no statistical relationship between hours of study and the score on a test. Lecture #2-1/27/2005 Slide 22 of 46

Testing the VAC Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test Because of the relationship of r 2 xy with ss reg the previous test could be phrased as: F = r 2 xy df reg 1 r 2 xy df res = r 2 xy k 1 r 2 xy N k 1 = 0.1728 1 1 0.1728 18 = 3.76 Notice this is identical to the F we received before. Therefore our conclusions are the same... rapping Up Lecture #2-1/27/2005 Slide 23 of 46

Testing the Regression Coefficient Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test Finally, the regression coefficient between X and Y can tested for difference from zero: H 0 : b = 0 H 1 : b 0 The variance of Y conditional on X is given by: (Y Y s 2 ) y.x = N k 1 = ss res N k 1 The standard error of the regression coefficient, s b, is based on the conditional variance of Y X: s 2 y.x s b = x 2 rapping Up Lecture #2-1/27/2005 Slide 24 of 46

Testing the Regression Coefficient Finally, the test statistic for the following hypothesis test is: H 0 : b = 0 H 1 : b 0 Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test t = b s b This statistic is t distributed, with N-k-1 degrees of freedom (which is the denominator of s 2 y.x). rapping Up Lecture #2-1/27/2005 Slide 25 of 46

Testing the Regression Coefficient - Example From our example, the test of the regression coefficient is: H 0 : b = 0 H 1 : b 0 Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test s 2 y.x = (Y Y ) N k 1 = ss res N k 1 = 107.7 18 x 2 = 40 = 5.983 rapping Up t = b s 2 y.x x 2 = 0.75 5.983 40 = 1.94 Typing in =tdist(1.94,18,2) (2 for a two-tailed test), gives a p-value of 0.0683 - the same as before. Recall F = t 2 from somewhere before... Lecture #2-1/27/2005 Slide 26 of 46

Testing the Regression Coefficient Statistical Significance Hypothesis Tests Regression Test VAC Test Coefficient Test In our example, we tested against zero. In reality, any value can be tested against: H 0 : b = B H 1 : b B t = b B s 2 y.x x 2 rapping Up Lecture #2-1/27/2005 Slide 27 of 46

Key Behind Basic assumptions: X is fixed - measured without error. The mean of Y X has a linear relationship with X Error assumptions: Mean of errors is zero. All errors independent. The variance of the errors does not change as a function of X - homoscedasticity. All errors are uncorrelated with X. Statistically speaking, errors are IID: e N(0, σ 2 e). rapping Up Lecture #2-1/27/2005 Slide 28 of 46

: ays of Checking An easy way to check many error based assumptions a visualization: A plot of the standardized residual versus the predicted value. The next class will focus on other diagnostic measures. Example rapping Up Lecture #2-1/27/2005 Slide 29 of 46

Z Curvilinear Example hat if we were to fit a linear regression to these data? 10.00 Example 7.50 5.00 rapping Up 2.50 0.00 1.00 2.00 3.00 4.00 5.00 Hours of Study Lecture #2-1/27/2005 Slide 30 of 46

Z Curvilinear Example Fitting the simple linear regression model gives: 10.00 Example 7.50 5.00 rapping Up 2.50 Z = 7.75 + 0.55 * X R Square = 0.04 0.00 1.00 2.00 3.00 4.00 5.00 Hours of Study Not good? Lecture #2-1/27/2005 Slide 31 of 46

Curvilinear Example Let s take a look at a plot of the standardized residuals versus the predicted values: Example Standardized Residual 1.00000 0.00000 rapping Up 1.00000 5.00000 5.50000 6.00000 6.50000 7.00000 Unstandardized Predicted Value Lecture #2-1/27/2005 Slide 32 of 46

hen X is Random Commonly, we have no control over our predictor variable. Observational Studies Statistically speaking, having X being random adds: The potential for measurement error in X - reliability. Limitations to the causal inferences that can be drawn from a regression analysis. All test statistics and measures are used equivalently. e can predict Y from X or X from Y. Correlations are more common place: common variance rather than causal influence. rapping Up Lecture #2-1/27/2005 Slide 33 of 46

rap Up Example Example taken from eisberg (1985, p. 230): Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests Perhaps the most famous geyser is Old Faithful, in Yellowstone National Park, yoming. The intervals of eruptions of Old Faithful range from about 30 to 90 minutes. ater shoots to heights generally over 35 meters, with eruptions lasting from 1 to 5.5 minutes. Data on Old Faithful has been collected for many years by ranger/naturalists in the park, using a stopwatch. The duration measurements have been rounded to the nearest 0.1 minute or 6 seconds, while intervals reported are to the nearest minute. The National Park Service uses x (values of the duration of an eruption) to predict y (the interval to the next eruption). rapping Up Lecture #2-1/27/2005 Slide 34 of 46

Old Faithful Postcard Lecture #2-1/27/2005 Slide 35 of 46

Old Faithful Scatterplot 100.00 Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests Interval to Next Eruption 80.00 60.00 40.00 rapping Up 20.00 1.00 2.00 3.00 4.00 5.00 Duration of Eruption Lecture #2-1/27/2005 Slide 36 of 46

Descriptive Statistics Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests Variable N Mean SD x - Duration 107 3.4607 1.0363 y - Interval 107 71.000 12.967 r xy = 0.858 r 2 xy = 0.736 x 2 = 113.84 y 2 = 17, 822.00 xy = 1, 222.70 rapping Up Lecture #2-1/27/2005 Slide 37 of 46

Regression Line b = xy x 2 = 1, 222.70 113.84 = 10.74 a = Ȳ b X = 71.0 (10.74) 3.4607 = 33.84 Y = 33.84 + 10.74X Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests rapping Up Lecture #2-1/27/2005 Slide 38 of 46

Regression Line Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests rapping Up Interval to Next Eruption 90.00 80.00 70.00 60.00 50.00 Interval to Next Eruption = 33.83 + 10.74 * X R Square = 0.74 40.00 2.00 3.00 4.00 5.00 Duration of Eruption Lecture #2-1/27/2005 Slide 39 of 46

Hypothesis Tests ss reg = b xy = 10.74 1, 222.70 = 13, 132.99 ss res = y 2 ss reg = 17, 822.0 13, 132.99 = 4, 689.01 F = ss reg k ss res N k 1 = 13, 132.99/1 4, 689.01/(107 1 1) = 294.084 Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests rapping Up Lecture #2-1/27/2005 Slide 40 of 46

Residual Plot Old Faithful Scatterplot Descriptives Regression Line Hypothesis Tests Standardized Residual 2.00000 1.00000 0.00000 1.00000 rapping Up 2.00000 60.00000 70.00000 80.00000 Unstandardized Predicted Value Lecture #2-1/27/2005 Slide 41 of 46

Data Data More Data Scatterplots rapping Up Lecture #2-1/27/2005 Slide 42 of 46

Data Data More Data Scatterplots rapping Up Lecture #2-1/27/2005 Slide 43 of 46

Plots from Today Scatterplots were made with the interactive graphs: Graphs...Interactive...Scatterplot. Data More Data Scatterplots rapping Up Lecture #2-1/27/2005 Slide 44 of 46

Analyze...Regression...Linear. Data More Data Scatterplots rapping Up Lecture #2-1/27/2005 Slide 45 of 46

Saving Residuals Analyze...Regression...Linear Save Button. Data More Data Scatterplots rapping Up Lecture #2-1/27/2005 Slide 46 of 46

Final Thought rapping Up Final Thought Next Class Simple linear regression can be simple and straight forward. Need to understand difference between statistical significance and practical relevance. The tests of significance and model estimates are only valid if all assumptions are met. Regression is fairly robust to violations of assumptions, but... Usually when assumptions are violated, something else is wrong with the model. Listen to your data, they are trying to tell you something. Lecture #2-1/27/2005 Slide 47 of 46

Next Time Regression diagnostics - Chapter 3. More. More practice with fitting models. QUESTION: Do you have any data you are willing to share with class? rapping Up Final Thought Next Class Lecture #2-1/27/2005 Slide 48 of 46