where x and ȳ are the sample means of x 1,, x n

Similar documents
Ch 2: Simple Linear Regression

Linear models and their mathematical foundations: Simple linear regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression

Multiple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Lecture 14 Simple Linear Regression

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Simple and Multiple Linear Regression

Inference for Regression

[y i α βx i ] 2 (2) Q = i=1

Simple Linear Regression

Simple Linear Regression

Linear Models and Estimation by Least Squares

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Chapter 12 - Lecture 2 Inferences about regression coefficient

3. Linear Regression With a Single Regressor

Problem Selected Scores

Regression diagnostics

Simple linear regression

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Statistics and Econometrics I

Lecture 15. Hypothesis testing in the linear model

STAT 513 fa 2018 hw 5

Applied Regression Analysis

ECON3150/4150 Spring 2016

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Simple Linear Regression

The Simple Linear Regression Model

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Correlation and Regression

Master s Written Examination

Chapter 14. Linear least squares

Chapter 1. Linear Regression with One Predictor Variable

The Simple Regression Model. Part II. The Simple Regression Model

Measuring the fit of the model - SSR

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

STAT 111 Recitation 7

COPYRIGHT. Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006

Bayesian Inference. Chapter 9. Linear models and regression

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Simple Linear Regression

Regression #3: Properties of OLS Estimator

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Statistical View of Least Squares

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

HT Introduction. P(X i = x i ) = e λ λ x i

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Lecture 18 MA Applied Statistics II D 2004

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

2017 Financial Mathematics Orientation - Statistics

Inference in Regression Analysis

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

STAT 3A03 Applied Regression With SAS Fall 2017

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

STAT 540: Data Analysis and Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Multivariate Regression (Chapter 10)

Maximum Likelihood Estimation

Regression and Statistical Inference

Handout 4: Simple Linear Regression

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Lecture 6 Multiple Linear Regression, cont.

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Bayesian Linear Models

Bias Variance Trade-off

Matrix Approach to Simple Linear Regression: An Overview

MATH5745 Multivariate Methods Lecture 07

CAS MA575 Linear Models

ECON3150/4150 Spring 2015

Formal Statement of Simple Linear Regression Model

4 Multiple Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

STAT 705 Chapter 16: One-way ANOVA

STA 114: Statistics. Notes 21. Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

ECON The Simple Regression Model

Linear Models A linear model is defined by the expression

Stat 579: Generalized Linear Models and Extensions

Linear Regression (9/11/13)

Linear Regression & Correlation

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

STA 2201/442 Assignment 2

Review of probability and statistics 1 / 31

Introduction to Estimation Methods for Time Series models. Lecture 1

Homework 2: Simple Linear Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 2101/442 Assignment Four 1

Transcription:

y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x = dose x 0 5 10 15 20 25 y 14.6 24.5 21.8 34.5 35.1 43.0 y = α + βx + ɛ where x y are observed; α β are unknown; ɛ is a rom error with mean 0. 15 20 25 30 35 40 0 5 10 15 20 25 Figure 1: A Scatterplot x Note: x is a design variable set by the experimenter. Slide 1 Slide 2 From the Coleman Report y = Ave 6 th Grade Verbal Score x = Mother s Education (yrs) x 12.38 10.34 14.08 14.20 12.3 11.46 y 37.01 26.51 36.51 40.70 37.1 33.40 Drawing the Line Least Squares Estimators The Problem: Given (x 1 y 1 ) (x n y n ) find a b to minimize SS(a b) = (y i a bx i ) 2. The Solution: 26 28 30 32 34 36 38 40 11 12 13 14 x Figure 2: A Scatterplot Note: x is a covariate measured with y. b = s xy a = ȳ b x where x ȳ are the sample means of x 1 x n y 1 y n s xy = (x i x)(y i ȳ) n (x i x) 2 Slide 3 Slide 4

Recall: Differentiate: Solve: SS(a b) = The Details (y i a bx i ) 2. n a SS(a b) = 2 (y i a bx i ) n b SS(a b) = 2 (y i a bx i )x i. SS(a b) = 0 a SS(a b) = 0. b SS(a b) = 2n(ȳ a b x). a a = ȳ b x. n b SS(a b) = 2 [y i ȳ b(x i x)x i Now = 2 { n } (y i ȳ)x i b (x i x)x i (x i x) = 0 = (y i ȳ). (y i ȳ)x i = (y i ȳ)(x i x) = s xy Slide 5 Slide 6 say (x i x)x i = (x i x) 2 = Coleman a =.1312 b = 2.8149 say. b SS(a b) = 2{ s xy b } b = s xy a = ȳ b x. Verbal Score 26 28 30 32 34 36 38 40 11 12 13 14 Mother s Ed The Least Squares Line: y = a + bx. Figure 3: A Scatterplot Least Squares Line Slide 7 Slide 8

y Calculating a b By Machine: Using Excel-for example. By H: Recall = (x i x) 2 = x 2 i n x 2. Similarly s xy = (x i x)(y i ȳ) = x i y i n xȳ. Dose Response x y xy x 2 0 14.6 0 0 5 24.5 122.5 25 10 21.8 218 100 15 34.5 517.5 225 20 35.1 702 400 25 43.0 1075 625 Sums 75 173.5 2635 1375 a b can be calculated from the sums of x i y i x i y i x 2 i. Slide 9 Slide 10 The Calculations x = 75 6 = 12.5 ȳ = 173.5 6 = 28.92 s xy = 2635 6 12.5 28.92 = 466.25 = 1375 6 (12.5) 2 = 437.5 b = 466.25 437.5 = 1.066 a = 28.92 1.066 12.5 = 15.595. 15 20 25 30 35 40 A Least Squares Line Pressure vs. Dose a = 15.595 b = 1.066 0 5 10 15 20 25 Figure 4: A Scatterplot Least Squares Line x Slide 11 Slide 12

Some Terminology Least Squares Estimators: b = s xy a = ȳ b x. Fitted Values AKA Predicted Values: Residuals: ŷ i = a + bx i = ȳ + b(x i x). e i = y i ŷ i = y i ȳ b(x i x) Regression Sum of Squares: SSR = (ŷ i ȳ) 2 = b 2 Error Sum of Squares AKA Residual Sum of Squares: SSE = e 2 i. s yy = (y i ȳ) 2. s yy = SSR + SSE. R 2 = SSR s yy. 100R 2 = % ExplainedVariation Note: SSE = s yy SSR = s yy b 2. Slide 13 Slide 14 Derivation of s yy = SSR + SSE s yy = (y i ŷ i + ŷ i ȳ) 2 = e 2 i + 2 e i (ŷ i ȳ) + (ŷ i ȳ) 2 1st = SSE 3rd = SSR 2nd = 2 [(y i ȳ) b(x i x)b(x i x) = 2[bs xy b 2 = 0 Model: Now suppose where Inference y i = α + βx i + ɛ i ɛ 1 ɛ n ind Normal[0 σ 2. Notes -a) < α β < σ 2 > 0 are unknown. b) If x 1 x n are covariates then the conditions must hold conditionally given x 1 x n. c) y i Normal[α + βx i σ 2 are (conditionally) independent. Slide 15 Slide 16

The Likelihood Function The Likelihood Function is n 1 exp[ 1 2πσ 2 2σ (y 2 i α βx i ) 2 = [ 1 n 1 exp[ 2πσ 2 2σ 2 (y i α βx i ) 2 the log-likelihood function is l(α β σ 2 x y) = 1 2σ 2 (y i α βx i ) 2 1 2 n[log(σ2 ) + log(2π). Maximum Likelihood Estimators: ˆα ˆβ must minimize the sum of squares. MLE = LSE. That is ˆβ = b = s xy ˆα = a = ȳ b x. The Profile Likelihood Function: l(ˆα ˆβ σ 2 x y) = 1 2σ 2 SSE The MLE of σ 2 : 1 2 n[log(σ2 ) + log(2π). σ 2 l = 1 2σ 4 SSE 1 2σ 2 n. ˆσ 2 = SSE n. MSE = SSE n 2. Slide 17 Slide 18 Means Variances Of The Estimators Unbiasedness: ˆα ˆβ are unbiased; that is Variances: E( ˆβ) = β E(ˆα) = α. σ 2ˆβ = σ2 σ 2ˆα = [ 1 n + x2 σ 2 Derivation For ˆβ: First s xy = = (x i x)(y i ȳ) (x i x)y i E( ˆβ) = 1 E(S xy ) = 1 (x i x)e(y i ) = 1 = 1 (x i x)(α + βx i ) (x i x)β(x i x) = 1 β = β. since n (x i x) = 0. Slide 19 Slide 20

Similarly σ 2ˆβ = 1 s 2 Var [ n (x i x)y i xx = 1 s 2 xx (x i x) 2 σ 2 = 1 β s 2 xx = σ2. Notes: Unbiasedness requires (only) E(ɛ i ) = 0. Variance requires also E(ɛ 2 i ) = σ2. Similarly for ˆα. Sampling Distributions ˆα Normal[α σ 2ˆα ˆβ Normal[β σ 2ˆβ SSE σ 2 χ 2 n 2; (ˆα ˆβ) is independent if SSE Corollary: MSE is unbiased; that is E(MSE ) = σ 2. Note: The proof is similar to the independence of X S 2 in the one-sample normal problem. Note: Unbiasedness of MSE requires only E(ɛ i ) = 0 E(ɛ 2 i ) = σ2. Slide 21 Slide 22 Studentization: Confidence Intervals ˆσ 2ˆβ = MSE T = ˆβ β ˆσ ˆβ T t n 2 if c is the 97.5 th percentile of t n 2 for example then P [ c T c =.95. iff c T = ˆβ β ˆσ ˆβ c ˆβ cˆσ ˆβ β ˆβ + cˆσ ˆβ. Confidence Interval For β: ˆβ ± cˆσ ˆβ is a 95% confidence interval for β. Confidence Interval for α: Similarly ˆα ± cˆσ ˆα is a 95% confidence interval for α. 0.1 O i Slide 23 Slide 24

United Data Services n = 14 x = Units Serviced y = Time. c = 2.180 ˆβ = 15.509 = 114 MSE = 29.074 Time 50 100 150 ˆσ 2ˆβ = 29.074 114 = (.505) 2 ˆβ ± cˆσ ˆβ = 15.509 ± 2.18.505 2 4 6 8 10 Units Figure 5: A Scatterplot = 15.51 ± 1.10 Slide 25 Slide 26 Testing H 0 : β = 0 From the Confidence Interval: Accept if Equivalently reject if ˆβ cˆσ ˆβ 0 ˆβ + cˆσ ˆβ. T 0 = ˆβ > c. ˆσ ˆβ : Dose Response. ˆβ ± cˆσ ˆβ = 1.066 ±.449 therefore H 0 is rejected. Note: This is the GLRT (as in the one sample problem). Review Simple Linear Regression: Y = α + βx + ɛ Least squares estimators a b. Properties of the estimators Sampling distributions Confidence intervals Testing Today Estimating expected response Predicting a future value Slide 27 Slide 28

Estimating Expected Response µ(x) = α + βx = E(Y x). Fix an x 0 let µ 0 = µ(x 0 ) = α + βx 0 µ(x) = µ 0 + β(x x 0 ). µ 0 = α when x = x x 0. ˆµ 0 = ˆα + ˆβx 0. E(ˆµ 0 ) = µ 0 σ 2ˆµ 0 = [ 1 n + (x 0 x) 2 σ 2. ˆσ 2ˆµ 0 = [ 1 n + (x 0 x) 2 MSE. ˆµ 0 ± cˆσˆµ0 is a level 95% confidence interval for µ 0. Slide 29 Slide 30 Predicting a Future Value Now let Y 0 Normal[µ 0 ˆσ 2 Ŷ 0 = ˆµ 0. := Y 0 Ŷ 0 = Y 0 µ 0 (ˆµ 0 µ 0 ). E( ) = 0 σ 2 = σ 2 + σ 2ˆµ 0 = [ 1 + 1 n + (x 0 x) 2 σ 2 iff The interval ˆσ t n 2 P [ c ˆσ c =.95. c ˆσ c Ŷ 0 cˆσ Y 0 Ŷ0 + cˆσ. P [Ŷ 0 cˆσ Y 0 Ŷ 0 + cˆσ =.95. Y 0 Ŷ 0 Normal[0 σ 2. ˆσ 2 = [ 1 + 1 n + (x 0 x) 2 MSE. Ŷ 0 ± cˆσ is called a 95% prediction interval for Y 0. Slide 31 Slide 32

Take United Data Service x 0 = 4 µ 0 = α + 4β. ˆµ 0 = 4.162 + 4 15.509 = 66.198. For this The Prediction Interval Ŷ 0 = ˆµ 0 = 66.198 ˆσ 2 = [ 1 + 1 (4 6)2 + 29.074 14 114 = (5.672) 2 Next So ˆσ 2ˆµ 0 = [ 1 14 + (4 6)2 114 x = 6 29.074 = (1.76) 2. ˆµ 0 ± cˆσˆµ0 = 66.198 ± 2.18 1.76 = 66.20 ± 3.84. Ŷ 0 ± cˆσ = 66.198 ± 2.18 5.672 = 66.20 ± 12.36 Note: Average response versus individual response. Slide 33 Slide 34