Simple Linear Regression

Similar documents
Simple Linear Regression

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Ch 2: Simple Linear Regression

Lecture 14 Simple Linear Regression

Regression diagnostics

13 Simple Linear Regression

Math 3330: Solution to midterm Exam

5. Linear Regression

Ch 3: Multiple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

14 Multiple Linear Regression

Statistical Modelling in Stata 5: Linear Models

Categorical Predictor Variables

Scatter plot of data from the study. Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Homework 2: Simple Linear Regression

Inferences for Regression

5. Linear Regression

Lecture 1: Linear Models and Applications

Applied Regression Analysis

Scatter plot of data from the study. Linear Regression

Simple Linear Regression

Simple linear regression

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Lecture 19 Multiple (Linear) Regression

MATH 644: Regression Analysis Methods

Inference for Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Linear models and their mathematical foundations: Simple linear regression

STAT5044: Regression and Anova

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Introduction and Single Predictor Regression. Correlation

Prediction Intervals in the Presence of Outliers

Remedial Measures, Brown-Forsythe test, F test

where x and ȳ are the sample means of x 1,, x n

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Multiple Linear Regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Exam Applied Statistical Regression. Good Luck!

Lectures on Simple Linear Regression Stat 431, Summer 2012

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Reference: Davidson and MacKinnon Ch 2. In particular page

Unit 6 - Simple linear regression

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Regression Analysis for Data Containing Outliers and High Leverage Points

Lecture 10 Multiple Linear Regression

3. Linear Regression With a Single Regressor

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

Simple and Multiple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Simple Linear Regression

Lecture 18: Simple Linear Regression

Linear Modelling: Simple Regression

Robust Bayesian Simple Linear Regression

MATH11400 Statistics Homepage

STATISTICS 479 Exam II (100 points)

Correlation and regression

Density Temp vs Ratio. temp

Inference for Regression Inference about the Regression Model and Using the Regression Line

Simple Linear Regression

Measuring the fit of the model - SSR

Lecture 6 Multiple Linear Regression, cont.

Simple Linear Regression Analysis

STAT 4385 Topic 03: Simple Linear Regression

Lecture 11: Simple Linear Regression

Lecture 2. Simple linear regression

Unit 6 - Introduction to linear regression

STATISTICS 110/201 PRACTICE FINAL EXAM

Tutorial 6: Linear Regression

Chapter 1. Linear Regression with One Predictor Variable

Analysing data: regression and correlation S6 and S7

General Linear Model (Chapter 4)

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Lecture 15. Hypothesis testing in the linear model

Lecture 18 MA Applied Statistics II D 2004

Well-developed and understood properties

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

15.1 The Regression Model: Analysis of Residuals

Lecture 3: Inference in SLR

Sensitivity of GLS estimators in random effects models

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

STAT 705 Chapter 16: One-way ANOVA

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Formal Statement of Simple Linear Regression Model

36-707: Regression Analysis Homework Solutions. Homework 3

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

ERRATA FOR THE OTT/LONGNECKER BOOK AN INTRODUCTION TO STATISTICAL METHODS AND DATA ANALYSIS (3rd printing)

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Transcription:

Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20

Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring p(y x) as a function of x Understanding the mean (variability) in Y as a function of x Special cases: linear regression (normal Y ) logistic regression (binomial Y success prob depends on x) Simple Linear Regression p.2/20

Models Model with additive error: for i = 1,...,n, Y i = f(x i ) + ǫ i Regression function E(Y x) = f(x) Taylors series expansion of f(x i ) = f(x 0 ) + f (x 0 )(x i x 0 ) + Remainder leads to locally linear approximation Y i = α + βx i + ε i ε i : independent errors (sampling, measurement, lack of fit) Simple Linear Regression p.3/20

BIG PICTURE: Y i = α + βx i + ε i, Estimate parameters (α,β,σ 2 ) interpretation of parameters: β, α ε i iid N(0,σ 2 ) assess model fit adequate? good? if inadequate, how? predict new ( future ) responses at new x n+1,... how much variability does x explain? normal models: variance measures variability analysis of variance (anova) Simple Linear Regression p.4/20

Example: Body Fat data For a group of subjects, various body measurements were obtained An accurate measurement of the percentage of body fat is recorded for each Goal is to use the other body measurements as a proxy for predicting body fat Focus on simple linear regression model for predicting body fat as a function of abdomen circumference Simple Linear Regression p.5/20

Data Percent Bodyfat 0 10 20 30 40 80 100 120 140 Circumference of Abdomin (cm) Simple Linear Regression p.6/20

Sample Summary Statistics Sample means: x, ȳ Sample variances: s 2 y = S yy/(n 1) s 2 x = S xx/(n 1) sample covariance s xy = S xy /(n 1) where the Sums of Squares are: S yy = n i=1 (y i ȳ) 2 : Total Variation in response S xx = n i=1 (x i x) 2 S xy = n i=1 (x i x)(y i ȳ) Simple Linear Regression p.7/20

Correlation Sample correlation is covariance in a standardized scale (unit-less) measure of dependence r = s xy s x s y 1 r 1 Simple Linear Regression p.8/20

Ordinary Least Squares (OLS) For any chosen α,β, Q(α,β) = n ε 2 i = n (y i α βx i ) 2 i=1 i=1 measures fit of chosen line α + βx to response data OLS estimator: Choose ˆα, ˆβ to minimize Q(α,β) Ad-hoc principal of least squares estimation Under normal error assumption OLS is equivalent to MLE Simple Linear Regression p.9/20

Least Squares Estimates FACTS: or ˆβ = s xy s 2 x, ˆα = ȳ ˆβ x ( ) sy ˆβ = r s x ˆβ is correlation coefficient r, corrected for relative scales of y : x so that the units of the fitted values ˆα + ˆβx are on scale of Y For use in theoretical derivations ˆβ = S xy S xx Simple Linear Regression p.10/20

R 2 measure of model fit: Simplest model: β = ˆβ = 0 so Y i are a normal random sample mean α ˆα = ȳ, Q(ȳ, 0) = S yy = Total Sum of Squares = TSS Any other model fit: SSE = Sum of Squares Error Q(ˆα, ˆβ) R 2 = 1 Q(ˆα, ˆβ)/Q(ȳ, 0) = 1 Sum Squares Error Total SS TSS = SS due to Regression on X + SSE Simple Linear Regression p.11/20

Facts R 2 = r 2 Higher % variation explained is better: Higher correlation Measures linear correlation only not general dependence not causation Can be used to compare other simple linear regression models with transformations of X Cannot be used to compare models with transformed Y Does NOT provide a measure of model adequacy Simple Linear Regression p.12/20

Summarizing Model Fit Fitted values Ŷi = ˆα + ˆβx i Residuals ˆε i = Y i Ŷi estimates of ε i Residual sum of squares = SSE = Q(ˆα, ˆβ) = n i=1 ˆε2 i measures remaining/residual variation in response data s 2 Y X is a point estimate of σ2 from fitted model s 2 Y X = MSE = SSE n 2 = n i=1 ˆε 2 i n 2 note: n 2 degrees of freedom, not n 1 lose 2 degrees of freedom for estimation of α,β Simple Linear Regression p.13/20

Some R commands bodyfat.lm = lm(bodyfat abdomin, data = bodyfat) summary(bodyfat.lm) (regression output) plot(bodyfat.lm) (residual plots) anova(bodyfat.lm) (Analysis of Variance) regr1.plot(bodyfat.lm) in library(hh) Simple Linear Regression p.14/20

Model Assessment Residual analysis: Graphical exploration of fitted residuals ˆε i Standardize: r i = ˆε i / var(ˆε i ) var(ˆε i ) = ˆσ 2 (1 h ii ) h ii leverage measure of potential influence Check normality assumption, constant variance, outliers, influence Treat ˆε i as new data look at structure, other predictors Other predictors? Transformations? Revise model before making interpretations... Simple Linear Regression p.15/20

Residuals Residuals 20 0 10 Residuals vs Fitted 207 204 39 10 20 30 40 50 Standardized residuals 4 0 2 Normal Q Q 207 204 39 3 2 1 0 1 2 3 Fitted values Theoretical Quantiles Standardized residuals 0.0 1.0 2.0 Scale Location 207 204 10 20 30 40 50 39 Standardized residuals 4 0 2 Residuals vs Leverage 216 41 0.5 Cook s distance 1 0.00 0.02 0.04 0.06 0.08 0.10 39 0.5 Fitted values Leverage Simple Linear Regression p.16/20

Diagnostics Residuals versus fitted values (or versus x) Normal quantile plot of residuals (check distributional assumptions of the errors) Scale-location plot: ε i versus Ŷi. Detect if the spread of the residuals is constant over the range of fitted values. Cook s distance plot: shows if any data points have a large influence on the predicted values of the response variable. Values greater than 1 are considered influential. Case 39 appears influential... Simple Linear Regression p.17/20

Residuals Without Case 39 lm(bodyfat Abdomen, subset=c(-39)) Residuals 10 0 10 Residuals vs Fitted 207 204 180 10 20 30 40 Standardized residuals 2 0 2 Normal Q Q 207 204180 3 2 1 0 1 2 3 Fitted values Theoretical Quantiles Standardized residuals 0.0 1.0 Scale Location 207 204 180 10 20 30 40 Standardized residuals 2 0 2 Residuals vs Leverage 216 36 Cook s distance 0.00 0.01 0.02 0.03 0.04 41 Fitted values Leverage Simple Linear Regression p.18/20

Summary Estimate Std. Error t value Pr(> t > summary( lm(bodyfat Abdomen, data=bodyfat, subset=c(-39))) Coefficients: (Intercept) -42.95774 2.71323-15.83 <2e- Abdomen 0.67195 0.02921 23.01 <2e- --- Residual standard error: 4.717 on 249 df Multiple R-squared: 0.6801, F-statistic: 529.3 on 1 and 249 DF, p-value: < 2.2e-16 Simple Linear Regression p.19/20

Interpretation For every additional centimeter of abdominal circumference, percent body fat increases by 0.67 percent For every additional inch of abdominal circumference, percent body fat increases by 2.54.67 = 1.7 percent Abdominal circumference explains roughly 68% of the variation in bodyfat Percent Body fat for 34 inch abdomin 42.96 + 34 2.54.67 = 14.9% Simple Linear Regression p.20/20