Simple Linear Regression

Similar documents
Simple Linear Regression

Homework 2: Simple Linear Regression

9. Linear Regression and Correlation

Lecture 18: Simple Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Lecture 10 Multiple Linear Regression

Ch 2: Simple Linear Regression

Correlation and Linear Regression

INFERENCE FOR REGRESSION

Multiple Linear Regression

Lecture 3: Inference in SLR

The Simple Linear Regression Model

Simple linear regression

Correlation and Regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Simple Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 11: Simple Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Intro to Linear Regression

Models for Clustered Data

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

11 Correlation and Regression

Models for Clustered Data

Statistical View of Least Squares

Regression. Marc H. Mehlman University of New Haven

9 Correlation and Regression

Lecture 9: Linear Regression

Inference for Regression

Linear models and their mathematical foundations: Simple linear regression

Statistics for Engineers Lecture 9 Linear Regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Inference for Regression Inference about the Regression Model and Using the Regression Line

R 2 and F -Tests and ANOVA

Chapter 14. Linear least squares

Simple Linear Regression Using Ordinary Least Squares

Six Sigma Black Belt Study Guides

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Unit 6 - Introduction to linear regression

Unit 6 - Simple linear regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Chapter 10. Simple Linear Regression and Correlation

Intro to Linear Regression

Applied Regression Analysis

Formal Statement of Simple Linear Regression Model

Correlation Analysis

Inference for Regression Simple Linear Regression

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Inference for the Regression Coefficient

Correlation & Simple Regression

This document contains 3 sets of practice problems.

Lecture 10: F -Tests, ANOVA and R 2

Section 3: Simple Linear Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Linear Correlation and Regression Analysis

Introduction and Single Predictor Regression. Correlation

Inferences for Regression

STAT 4385 Topic 03: Simple Linear Regression

STAT Chapter 11: Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Lecture 2. Simple linear regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Business Statistics. Lecture 10: Correlation and Linear Regression

A discussion on multiple regression models

Can you tell the relationship between students SAT scores and their college grades?

Lecture 14 Simple Linear Regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Probability and Statistics Notes

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

13 Simple Linear Regression

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Simple Linear Regression

Analysing data: regression and correlation S6 and S7

Chapter 9. Correlation and Regression

AMS 7 Correlation and Regression Lecture 8

1 The Classic Bivariate Least Squares Model

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

Simple linear regression

Mathematics for Economics MA course

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Review of Statistics 101

ECON3150/4150 Spring 2015

Chapter 1. Linear Regression with One Predictor Variable

Lecture 18 MA Applied Statistics II D 2004

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Introduction to Simple Linear Regression

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Transcription:

Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80

Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics) 5. Comments on interpretation Simple Linear Regression p. 2/80

What is Regression The study of the (linear) relationship between two measurable (observable) variables. Gasoline consumption and driving speed. Height and weight. Salary and merit. Cost of wine as a function of annual rainfall. College GPA and high school GPA. Boiling point of water and atmospheric pressure. Speed to proof read a paragraph and amount of caffiene consumed. Simple Linear Regression p. 3/80

Form of the Relationship The nature of the relationship between two numeric variables is known, because Function relationship. E.g., physical laws, Weight = f(mass, gravity). But have measurement errors. Behavioral theories. e.g., judgements of heaviness Equations are often a reasonable approximation. Simple Linear Regression p. 4/80

Example of Physical Law (from Weisberg) James Forbe (1840-1850 s) was a Scottish physicist Wanted to be able to estimate attitude above sea level from measurements of the boiling point of water. He knew that altitude could be determined from atmospheric pressure (measured with a barometer in 1840 barometers were very fragile). Simple Linear Regression p. 5/80

Forbe s Example (continued) Forbe s data : The boiling point of water was measured in degrees F o Atmospheric pressure was measured in inches of mercury (with air temperature adjustments). These data were collected at 17 locations in the Alps & Scotland. Forbe s Theory: Over the range of observed values of boiling points versus logarithm of pressure should be a straight line. Simple Linear Regression p. 6/80

Forbe s Data Simple Linear Regression p. 7/80

Forbe s Data (continued) Simple Linear Regression p. 8/80

Examples of behavioral theory Motor performance and age. Verbal skills and years of education. Time to solve a problem and size of group working on the problem. Quantitative GRE scores by the number of hours of college math courses taken. Others...... Before getting into the nitty gritty, let s take a quick look... Simple Linear Regression p. 9/80

Example Dependent variable or Y = quantitative GRE score. Independent variable or X = number of hours of college math courses taken. The strength of the relationship: r =.80. The form of the relationship,... look at the scatter plot.... Simple Linear Regression p. 10/80

The Data Simple Linear Regression p. 11/80

The Data and Conditional Means What minimizes i (Y i guess i ) 2 = i (Y i Ŷi) 2 = i (errors i) 2? Simple Linear Regression p. 12/80

Just the Means Simple Linear Regression p. 13/80

The Best fit line Simple Linear Regression p. 14/80

Uses of Regression Describe the relationship between variables succinctly (i.e., summarize). Measure the strength of the relationship. Predict one variable when values on other variable(s) are known. Adjust statistically control or experimental control responses for variable(s) known effects. Most often we try to summarize the relationship between two variables by a straight line**. Simple Linear Regression p. 15/80

Asymmetric Treatment of Variables The two variables play different roles: Y = response, dependent, outcome, predicted, or criterion. X = explanatory, independent, or predictors. The big question: How does Y vary with X? Simple Linear Regression p. 16/80

The Regression of Y on X The regression of Y on X corresponds to the curve (line) that gives the mean value of Y corresponding to each possible value of X. Conditional Distribution (mean) of Y given X. Notation: µ Y X or µ Y X or E(Y X) Very general definition of regression: µ Y X = mean of all Y values in the population with the same X value ( Doesn t have to be linear & Y doesn t have to be continuous ) Simple Linear Regression p. 17/80

How Suppose that we have measurements on n individuals of two variables, (X 1, Y 1 ), (X 2, Y 2 ),..., (X n, Y n ) e.g., X = number of hours college math courses. Y = quantitative GRE score. Always plot the data Look at form of the relationship? Any outliers? Simple Linear Regression p. 18/80

Simple Linear Regression Model Y i = µ y xi + ɛ i Y i = α + βx i + ɛ i Y i = µ + β(x i X) + ɛ i where X i s are known (and considered fixed). α is the Y intercept β population slope of regression line. ɛ i s are errors or residuals. µ = population mean of Y for X i = X. Simple Linear Regression p. 19/80

Assumptions Linear relationship between Y and X. For statistical inference, ɛ i N(0, σ 2 ) and independent Therefore, Y i N(µ Y xi, σ 2 ). E(Y i ) = E(α + βx i + ɛ i ) = α + βx i = µ Y xi σ 2 does not depend on X. MATLAB demo. Simple Linear Regression p. 20/80

Estimation: Method of Least Squares The Loss Function is n (Y i prediction i ) 2 = i=1 = = n (Y i Ŷi) 2 i=1 n (Y i (a + bx i )) 2 i=1 n ˆɛ 2 i i=1 For the best fit line, find those values of a and b so that the sum of squared errors is as small as possible. Simple Linear Regression p. 21/80

Least Squares Estimates The line goes through the point ( X,Ȳ ). The slope equals the change in Y for a 1 unit change in X, b = cov(x, Y ) var(x) The Y intercept equals = r xy s y s x a = Ȳ b X = Ȳ r xy s y s x X The estimated residuals equal ˆɛ i = (Y i Ŷi) = (Y i a bx i ) Simple Linear Regression p. 22/80

Least Squares Estimates The estimated residuals equal ˆɛ i = (Y i Ŷi) = (Y i a bx i ) There are two (linear) restrictions on ˆɛ i : n ˆɛ i = 0 and n ˆɛ i (X i X) = 0. i=1 i=1 Show why this is so... Simple Linear Regression p. 23/80

Eg:Least Squares Regression Line Graph of quant GRE vs hour math course will go here with various things identified. Simple Linear Regression p. 24/80

Example 2: IQ From Kodak & Skeels: the data are IQ of mothers and their adopted children when the child was 14 years old: Simple Linear Regression p. 25/80

Example 2: IQ Summary Statistics: Mother s IQ Child s IQ X = 84.11 Ȳ = 98.99 s x = 16.53 s y = 19.37 Estimating β : n = 9 r xy =.78 b = r xy s y s x = (.78) 19.37 16.53 =.911 Estimating α: a = Ȳ b X = 98.99 (.911)(84.11) Estimated Regression Line: Ŷ i = 22.269 +.911X i Simple Linear Regression p. 26/80

Estimated Regression Line b < 1: Nature of phenomenon? Artifact of regression? Simple Linear Regression p. 27/80

Plot of Residuals Simple Linear Regression p. 28/80

Decomposition of Y i Y i ˆɛ i Ŷ i = a + bx i Simple Linear Regression p. 29/80

Analysis of Variance Observations broken down into two independent parts: Y i = (model value) i + (error) i = Ŷi + ˆɛ i = (a + bx i ) + ˆɛ i Fact: Variance of a sum of independent variables equals the sum of the variance. Implication: (total variance of Y ) = (variance of Y due to model) + (variance of Y due to lack of fit) s 2 y = s 2 ŷ + s 2ˆɛ Simple Linear Regression p. 30/80

Analysis of Variance Variance of Y, s 2 y Residual variance, s 2ˆɛ Variance accounted for by the model, s 2 ŷ Simple Linear Regression p. 31/80

Analysis of Variance (continued) Partition of variance s 2 y = s 2 ŷ + s 2ˆɛ Divide both sides by s 2 y to get proportions 1 = (proportion of var(y ) accounted for by model) +(proportion of var(y ) not accounted for by model) 1 = s2 ŷ s 2 y + s2ˆɛ s 2 y What do s 2 y, s 2 ŷ and s2ˆɛ equal? Simple Linear Regression p. 32/80

Variances: s 2 ŷ & s2ˆɛ s 2 ŷ = = = 1 n 1 n (Ŷi Ȳ )2 i=1 (Sum of Squares model ) n 1 1 n 1 (SS model ) s 2ˆɛ = = 1 n 1 1 n 1 n (ˆɛ i 0) 2 i=1 n i=1 (SSres)= 1 n 1 n (SSˆɛ ) i=1 Simple Linear Regression p. 33/80

ANOVA (continued) Note s 2 y = 1 n 1 n i=1 (Y i Ȳ )2 = 1 n 1 SS total Partitioning variance and sums of squares: s 2 y = s 2 ŷ + s 2ˆɛ 1 n 1 (SS total ) = 1 n 1 (SS model ) + 1 n 1 (SS res) SS total = SS model + SSres The sums of squares partition. Simple Linear Regression p. 34/80

Closer look at s 2 ŷ /s2 y = SS model /SS total s 2 ŷ = = = = r 2 xy 1 n 1 1 n 1 1 n 1 s 2 y s 2 x = r 2 xys 2 y n (Ŷi Ȳ )2 i=1 n ([Ȳ + r xy i=1 n ( r xy s y (X i s X) i=1 x ( n i=1 (X i X) ) 2 n 1 s y s x (X i X)] Ȳ )2 ) 2 Simple Linear Regression p. 35/80

Coefficient of Determination: Multiple R 2 s 2 ŷ s 2 y = r2 xys 2 y s 2 y = r 2 xy = SS model SS total = R 2 = (Proportion of the variance of Y accounted for by model) ( ) Predicted variance R 2 = Total variance R 2 Also, s 2ˆɛ s 2 y = 1 r 2 xy = Proportion of the total variance unexplained by the model Simple Linear Regression p. 36/80

Coefficient of Determination: continued If we use the overall mean as the predictor, then SS total = Sum of squares errors of prediction If we use the conditional mean Ŷi (from the regression of Y i onto X i ), then SSres = Sum of squares errors of prediction The proportional reduction in sum of squared errors of prediction equals SS total SSres SS total = SS model SS total = R 2 Simple Linear Regression p. 37/80

Comparison: Ȳ vs Ŷi as Predictor Proportional reduction in error: R 2 = (1, 234, 453 431, 639)/1, 234, 453 = 802, 814/1, 234, 453 =.6503 Simple Linear Regression p. 38/80

Summary from our Examples (Most rounded to nearest integer) Statistics QuantGRE Adopted n 91 9 SS model 802,814 1,814 SSres 431,639 1,186 SS total 1,234,453 3,000 s 2 y 13,716 375 s 2ˆɛ 4,850 169 s 2 ŷ 8,866 206 R 2.65.60 Simple Linear Regression p. 39/80

Statistical Inference in Regression Slope parameter β. Digression/connnection: t-test for correlation. Foreshadowing: analysis of variance, F -test. Fitted values µ y x Predictions of Y for a new observation. Simple Linear Regression p. 40/80

Statistical Inference: β H o : β = 0 vs H a : β 0. Why? Does X provide any predictive information? Does X provide any explanatory power regarding the variability of Y? Is Ȳ equal to Ŷ? (i.e. regression equation flat?) Are X and Y correlated? Simple Linear Regression p. 41/80

Statistical Inference: β (continued) IF Y i = α + βx i + ɛ i ɛ i N(0, σ 2 ) ɛ i are independent (i.e., Y i are independent). THEN b N ( β, σ 2 ) n i=1 (X i X) 2 Simple Linear Regression p. 42/80

Statistical Inference: β (continued) Then standard error of b: σ SE(b) = 2 n i=1 (X i X) 2 = σ 1 n i=1 (X i X) 2 Estimate of σ, ˆσ = ˆσ y x = sˆɛ = = SS res n 2 n i=1 (Y i Ŷi) 2 n 2 How do we use this information? Simple Linear Regression p. 43/80

Hypothesis Test: β (continued) Hypothesis test: H o : β = 0 vs H a : β 0 Assumptions: Y i = α + βx i + ɛ i ɛ i N(0,σ 2 ) ɛ i s are independent. Test Statistic: t = b 0 SE(b) Sampling Distribution of the test statistic (if H o and assumptions are true) is Students t with ν = n 2 degrees of freedom. Simple Linear Regression p. 44/80

Hypothesis Test & CI for β (continued) (1 α)100% Confidence interval for β b ± t n 2,(1 α/2) SE(b) Example... quant GRE versus hours of math courses. Recall that Y i = 333.987 + 17.998x i Simple Linear Regression p. 45/80

Hypothesis Test for β (example) Estimate of σ, n i=1 ˆσ = sˆɛ = (Y i Ȳ )2 = SSres = MS res n 2 df res 431639 = 91 2 = 4849.87514 = 69.96 Estimate of SE(b), SE(b) = sˆɛ 1 (n 2)s 2 x = 69.96 1/(89(27.5385)) = 69.96(.02) = 1.40 Test statistic: t = 17.998/1.40 = 12.87. Using the t-distribution with df = 89, p <.001. Simple Linear Regression p. 46/80

Confidence Interval for β (example) A 95% Confidence interval for β, b ± t n 2,(1 α)/2 SE(b) 17.998 ± 1.987(1.40) (15.22, 20.78) Simple Linear Regression p. 47/80

Regression Using SAS Analyst Program code that will produce regression diagnostics: ods html; ods graphics on; proc reg data=prep2; run; model quantgre=hoursmath / cib; plot quantgre*hoursmath; title QuantGRE vs HoursMath ; ods graphics off; ods html close; Output, regression, diagnostics, etc. Simple Linear Regression p. 48/80

Digression/Connection Recall the hypothesis H : ρ xy = 0 and test statistic, t = r xy (1 + r 2 xy)/(n 2) This is equivalent to the t-test for H o : β = 0, t = b SE(b) Implication: You do not have to assume bivariate normality for X and Y to test H o : ρ xy = 0. You only have to assume linearity, ɛ i N(0,σ 2 ), and independence. Simple Linear Regression p. 49/80

Test: Variance accounted for by model Mean square error Definition: MSres = SSres/(n 2) = s 2ˆɛ Expected mean square error: E(MSres) = σ 2 Mean square model Definition: MS model = SS model /(1) Expected mean square E(MS model ) = σ 2 + β n i=1 (X i X) 2 F -test of H o : β = 0, F = MS model MSres = (SS model /ν model ) (SSres/νres) = (χ2 1/ν 1 ) (χ 2 2/ν 2 ) Simple Linear Regression p. 50/80

Statistical Inference (CI) for µ y x Our estimated model Ŷ i = ˆµ y x = a + bx i = Ȳ + b(x i X) Note that ˆµ y = Ȳ, which has standard error = σ/ n. b = r xy s y /s x with SE(β) previously given. Ȳ and b are statistically independent Variance of a sum of independent variables equals the sum of the variances... Simple Linear Regression p. 51/80

Statistical Inference (CI) for µ y x Variance of ˆµ y x, var(ˆµ y x ) = var(ȳ ) + var(b(x X)) 2 = var(ȳ ) + var(b)(x X) ( ) = σ2 n + σ 2 n i=1 (X i X) (x X) 2 2 ( 1 = σ 2 n + (x X) ) 2 n i=1 (X i X) 2 Estimator of standard error of ˆµ y x, (1 SE(ˆµ y x ) = s ɛ n + (x X) ) 2 n i=1 (X i X) 2 Simple Linear Regression p. 52/80

Statistical Inference (CI) for µ y x Estimator of standard error of ˆµ y x, (1 SE(ˆµ y x ) = ˆσ n + (x X) ) 2 n i=1 (X i X) 2 (1 SS res = df n + ( X) ) 2 n i=1 (X i X) 2 This depends on x! The closer x is to X, the smaller the standard error of ˆµ y x. Simple Linear Regression p. 53/80

The Standard Error of µ y x Quant GRE Example: The standard error of ˆµ y x ( ) 1 2 (x X) sˆµy x = s ɛ + n (n 1)s 2 x ( ) 1 (x 8.5385)2 = 69 + 91 90 (27.5384615) x SE(ˆµ y x ) x SE(ˆµ y x ) 0 13.87 9 7.26 4 9.59 10 7.51 6 8.04 12 8.68 7 7.54 16 12.62 8 7.27 20 17.45 Simple Linear Regression p. 54/80

Confidence Interval for µ y x (1 α)100% confidence interval for µ y x : Ȳ + b(x X) = (a + bx) ± t n 2,(α/2) (SE(ˆµ y x )) Example: quant GRE regressed onto hours of math courses. Simple Linear Regression p. 55/80

Confidence Interval for µ y x x 95% confidence bands s µy x ˆ Ŷ Lower Upper 2 11.59 369.98 347.02 392.94 4 9.58 405.97 386.99 424.95 6 8.04 441.97 426.04 457.89 8 7.27 477.96 463.57 492.36 10 7.51 513.96 499.09 528.83 12 8.67 549.95 532.77 567.14 14 10.46 585.95 565.22 606.68 16 12.62 621.95 596.96 646.93 18 14.97 657.94 628.29 687.59 Simple Linear Regression p. 56/80

95% Confidence Bands for MEANS Simple Linear Regression p. 57/80

CI for New Observation Prediction of Y i for a new observation. Our best guess, is Ŷ = a + bx = ˆµ y x But the true value for the person is Y = a + bx + ê = Ȳ + b(x X) + ê Each of the 3 components of Y is independent. Simple Linear Regression p. 58/80

CI for New Observation (continued) Variance of Ŷpred, var(ŷpred,x=x k ) = var(ȳ ) + var(b(x k x)) + var(ê) = σ2 n + σ2 (x k x) 2 n i=1 (x i x) 2 + σ2 = σ 2 ( 1 n + (x k x) 2 n i=1 (x i x) 2+ ) The standard error of Ȳpred is the square root of the above, (1 ) SE(Ŷpred,X=x k ) = σ n + (x k x) 2 n i=1 (x i X) + 1 2 Simple Linear Regression p. 59/80

CI for New Observation (continued) So, a (1 α)100% confidence interval for the prediction of a single new observation when X = x is Ŷ pred = ˆµ y x ± t n 2,(α/2) SE(Ŷpred) Simple Linear Regression p. 60/80

95% CI for Predicted quantgre x Ŷ i sˆµy x sŷi Lower Upper 2 369.982 11.59 69.96 231.44 508.5 4 405.978 9.58 69.66 268.04 543.9 6 441.973 8.04 69.46 304.42 579.5 8 477.968 7.27 69.38 340.59 615.3 10 513.964 7.51 69.40 376.53 651.3 12 549.959 8.67 69.54 412.26 687.6 14 585.954 10.46 69.78 447.77 724.1 16 621.950 12.62 70.14 483.06 760.8 18 657.945 14.97 70.60 518.14 797.7 Simple Linear Regression p. 61/80

95% Confidence Bands for PREDICTION Simple Linear Regression p. 62/80

Comparison of 95% Confidence Bands Simple Linear Regression p. 63/80

Miscellaneous Topics in Regression Regression to the mean. Effect of outliers on results Extrapolation beyond the data (restricted range) Examine the assumptions! Linearity Equal variance x (i.e., var(y x) = σ 2 ) Simple Linear Regression p. 64/80

Regression To the Mean Simple Linear Regression p. 65/80

Regression To Mediocrity Note that Ŷ i = a + bx i (Ŷi Ȳ ) s y Ŷ i = Ȳ + r xy = r xy (X i X) s x s y s x (X i X) Ẑ yi = r xy Z xi Since r xy 1, a one unit change in Z x leads to a smaller change in Ẑy... unless r xy = 1. Simple Linear Regression p. 66/80

Regression To the Mean Illustration of phenomena. o Simple Linear Regression p. 67/80

Extrapolation beyond the data Simple Linear Regression p. 68/80

Extrapolation beyond the data is a dangerous thing to do Simple Linear Regression p. 69/80

Extrapolation beyond the data Model 1 (x < 0): ˆσ 2 = 26.38, R 2 =.93, Ŷ i = 22.46 + 3.94x i and H o : β = 0 is rejected. Model 2 ( 4 x x): ˆσ 2 = 27.65, R 2 =.003, and Ŷ i = 14.09.12x i and H o : β = 0 is retained. Model 3 ( 4 x x): The correct one... ˆσ 2 = 0.64, R 2 =.98, and and H o : β = 0 is rejected. Ŷ i = 19.92 1.01x 2 i Simple Linear Regression p. 70/80

Model 3 o Simple Linear Regression p. 71/80

Effect of Outliers: Example 1 Simple Linear Regression p. 72/80

Effect of Outliers: Example 2 hyperlinkvariouso Simple Linear Regression p. 73/80

Transform to Linearity o = = Simple Linear Regression p. 74/80

Another Example: NELS88 data National Education Longitudinal Study conducted by National Center for Education Statistics of the US department of Education. Data are first in a series of longitudinal measurements of students starting in 8th grade. Data were collected Spring 1988. Response variable = Math. Explanatory variable = Time spent doing homework. n = 89. Data from www.stat.ucla.edu/ deleeuw/sagebook. Simple Linear Regression p. 75/80

Plot of Data What s wrong? Simple Linear Regression p. 76/80

The Problem with NELS Homework n Ȳ s 2 Why do we get this pattern? 0 5 47.60 204.80 1 25 49.96 136.87 2 12 56.33 102.24 3 8 63.37 17.41 4 15 63.00 35.71 5 21 64.28 31.61 6 2 68.50 4.50 Simple Linear Regression p. 77/80

What s Really Going... How can be handle this situation? Simple Linear Regression p. 78/80

Summary from our Examples (Most rounded to nearest integer) NELS88 Statistics QuantGRE Adopted Sch 24725 Sch 62821 n 91 9 22 66 SS model 802,814 1,814 1,473 234 SSres 431,639 1,186 631 1887 SS total 1,234,453 3000 2,104 2121 s 2 y 13,716 375 100 33 s 2ˆɛ 4,850 169 32 29 s 2 ŷ 8,866 206 68 4 R 2.65.60.70.11 Simple Linear Regression p. 79/80

Next Steps Adding more explanatory variables to the model. Simple Linear Regression p. 80/80