Linear models and their mathematical foundations: Simple linear regression

Similar documents
Ch 2: Simple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Linear Models and Estimation by Least Squares

Multiple Linear Regression

Simple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Measuring the fit of the model - SSR

Simple Linear Regression

Simple and Multiple Linear Regression

Ch 3: Multiple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

STAT 540: Data Analysis and Regression

where x and ȳ are the sample means of x 1,, x n

Formal Statement of Simple Linear Regression Model

Lecture 6 Multiple Linear Regression, cont.

Inference for Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Chapter 12 - Lecture 2 Inferences about regression coefficient

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Correlation Analysis

Applied Regression Analysis

Introduction to Estimation Methods for Time Series models. Lecture 1

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lectures on Simple Linear Regression Stat 431, Summer 2012

STT 843 Key to Homework 1 Spring 2018

Linear Regression Model. Badr Missaoui

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

[y i α βx i ] 2 (2) Q = i=1

Inferences for Regression

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Matrix Approach to Simple Linear Regression: An Overview

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

STA121: Applied Regression Analysis

Multiple Linear Regression

Mathematics for Economics MA course

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Multivariate Regression

The Simple Linear Regression Model

Multivariate Linear Regression Models

Simple linear regression

Concordia University (5+5)Q 1.

Lecture 9: Linear Regression

Chapter 14 Simple Linear Regression (A)

Bias Variance Trade-off

Math 3330: Solution to midterm Exam

Inference in Regression Analysis

R 2 and F -Tests and ANOVA

Simple Linear Regression

Inference for the Regression Coefficient

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

BNAD 276 Lecture 10 Simple Linear Regression Model

Chapter 1. Linear Regression with One Predictor Variable

ECON The Simple Regression Model

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Lecture 4 Multiple linear regression

Confidence Intervals, Testing and ANOVA Summary

Lecture 18 MA Applied Statistics II D 2004

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Regression Analysis II

Regression and Statistical Inference

Lecture 11: Simple Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

ANOVA (Analysis of Variance) output RLS 11/20/2016

Homoskedasticity. Var (u X) = σ 2. (23)

Homework 2: Simple Linear Regression

Lecture 3: Inference in SLR

The Standard Linear Model: Hypothesis Testing

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Table 1: Fish Biomass data set on 26 streams

Lecture 15. Hypothesis testing in the linear model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Master s Written Examination - Solution

Unit 10: Simple Linear Regression and Correlation

Math 423/533: The Main Theoretical Topics

Statistical Hypothesis Testing

STA 2201/442 Assignment 2

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Lecture 14 Simple Linear Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Regression diagnostics

Correlation and Regression

Applied Econometrics (QEM)

Inference for Regression Simple Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Transcription:

Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21

Introduction The simple linear regression model can be written as Y i = β 0 + β 1 x i + ɛ i, i = 1,..., n, where β 0 and β 1 are unknown parameters. The designation simple indicates that there is only one x to predict the response. Y i and ɛ i are random variables and the values of x i are known constants (the case in which the x i are random variables is treated later). Winter term 2018/19 2/21

assumptions To complete the model, we make the following assumptions: A1 E(ɛ i ) = 0 i = 1,..., n. A2 Var(ɛ i ) = σ 2 i = 1,..., n. A3 Cov(ɛ i, ɛ j ) = 0 i j. Occasionally, we will make use of the following additional assumption: A4 ɛ i N (0, σ 2 ) i = 1,..., n. Any of these assumptions may fail to hold with real data. Winter term 2018/19 3/21

Methods of estimation Given a random sample of n observations y 1,..., y n and fixed values x 1,..., x n, one can estimate the parameters β 0, β 1, and the error variance σ 2. To obtain the estimates ˆβ 0 and ˆβ 1, we may use the method of least squares, which does not require any of the assumptions A1 A4, or maximum likelihood estimation using assumptions A1 A4. Estimation of σ 2 : Least squares does not yield an estimator of σ 2. Maximum likelihood estimation. Winter term 2018/19 4/21

Least squares estimation The least squares approach seeks estimators ˆβ 0 and ˆβ 1 which minimize the sum of squares of the residuals y i ŷ i of the n observed y i s from their fitted values Ê(y i) = ŷ i = ˆβ 0 + ˆβ 1 x i : n ɛ 2 i = i=1 n i=1 (Y i β 0 β 1 x i ) 2 min β 0,β 1. Note that ˆβ 0 + ˆβ 1 x i estimates β 0 + β 1 x i and not β 0 + β 1 x i + ɛ i. To find the solution to this optimization problem, we differentiate the objective function with respect to β 0 and β 1, set the resulting equations equal to zero and solve for the unknowns. Winter term 2018/19 5/21

Least squares solution The least squares solution is given by ˆβ 1 = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 = n i=1 x iy i n xȳ n i=1 x2 i n x 2 and ˆβ 0 = ȳ ˆβ 1 x. To verify that the estimators above minimize the objective function of interest, we can examine the second derivatives. Winter term 2018/19 6/21

Properties of least squares estimates Using assumptions A1 A3, we obtain the following means and variances of ˆβ 0 and ˆβ 1 : E( ˆβ 0 ) = β 0 E( ˆβ 1 ) = β 1 [ 1 Var( ˆβ 0 ) = σ 2 n + x 2 ] n i=1 (x i x) 2 Var( ˆβ 1 ) = σ 2 n i=1 (x i x) 2. If for n, n (x i x) 2, i=1 then ˆβ 0 and ˆβ 1 are also consistent estimators. Winter term 2018/19 7/21

Estimation of σ 2 ( To estimate σ 2, recall that σ 2 = E [Y i E(Y i )] 2). We estimate σ 2 by an average from the sample, that is n s 2 i=1 = (y i ŷ i ) 2 = SSE n 2 n 2, where SSE = n i=1 (y i ŷ i ) 2 = n i=1 ˆɛ2 i denotes the residual (or error) sum of squares. E(s 2 ) = σ 2. Winter term 2018/19 8/21

Coefficient of determination The coefficient of determination, R 2, is defined as n R 2 i=1 = (ŷ i ȳ) 2 n i=1 (y i ȳ) 2 = SSR SST = 1 SSE SST, where SSR = n i=1 (ŷ i ȳ) 2 is the regression sum of squares and SST = n i=1 (y i ȳ) 2 is the total sum of squares that can be partitioned as follows: SST = SSR + SSE. R 2 is the the square of the sample correlation coefficient between y and x: R 2 = s2 xy s 2 x s 2 y = [ n i=1 (x i x)(y i ȳ)] 2 [ n i=1 (x i x) 2 ] [ n i=1 (y i ȳ) 2 ]. Winter term 2018/19 9/21

ANOVA table for simple linear regression Source of d.f. Sum of Mean square F statistic variation squares Regression 1 SSR MSR = SSR F = MSR MSE Residual n 2 SSE MSE = SSE (n 2) = s2 Total n 1 SST Winter term 2018/19 10/21

Confidence intervals for β 0 and β 1 Assuming A4, ɛ i N (0, σ 2 ), it holds for j = 0, 1: ˆβ j N (β j, σ 2ˆβ j ), ˆβ j β j ˆσ ˆβj t(n 2), where ˆσ ˆβj = Var( ˆβ j ) 1/2 is the standard error of ˆβ j. (1 α) 100% confidence intervals for β 0 and β 1 : [ ˆβ j ± ˆσ ˆβ j t 1 α/2 (n 2)], j = 0, 1. Sufficiently large n: replace quantiles of t(n 2) distribution by quantiles of N (0, 1) distribution. Winter term 2018/19 11/21

Hypothesis tests for β 0 and β 1 Example: Test statistic: H 0 : β 1 = 0 versus H 1 : β 1 0. T = ˆβ 1 0 ˆσ ˆβ 1 = ˆβ 1 ˆσ ˆβ 1 t(n 2), where ˆσ ˆβ1 = s n i=1 (x i x) 2. Rejection region (at significance level α): T > t 1 α/2;n 2, where t 1 α/2;n 2 is the 1 α/2 quantile of a t-distribution with n 2 degrees of freedom (d.f.). Winter term 2018/19 12/21

for σ 2 Note that (n 2)s 2 σ 2 χ 2 n 2. A 100 (1 α)% confidence interval for σ 2 is given by [ ] (n 2)s 2 (n 2)s2 χ 2, 1 α/2;n 2 χ 2, α/2;n 2 where χ 2 1 α/2;n 2 is the 1 α/2 quantile of a χ2 -distribution with n 2 d.f. Winter term 2018/19 13/21

Conditional normal model In the simple regression model we have discussed, the values of the predictor variable, x 1,..., x n, have been fixed, known constants. The conditional normal model with the assumptions A1 A4 is the most common simple linear regression model: Y i N (β 0 + β 1 x i, σ 2 ), i = 1,..., n. Thus the population regression function is E(Y x) = β 0 + β 1 x. Imposing A4 the uncorrelatedness of Y 1,..., Y n (with A1 A3) is strengthened to independence. Moreover, the exact form of the joint pdf of Y 1,..., Y n is now specified. Winter term 2018/19 14/21

Bivariate normal model Sometimes it is more reasonable to assume that these values are actually observed values of random variables. In the bivariate normal model the observed values (y 1, x 1 ),..., (y n, x n ) are realizations of the bivariate random vectors (Y 1, X 1 ),..., (Y n, X n ). The random vectors are assumed to be independent and (Y i, X i ) N 2 (µ y, µ x, σ 2 y, σ 2 x, ρ), i = 1,..., n. The joint pdf of (Y 1, X 1 ),..., (Y n, X n ) is the product of the bivariate pdfs. Winter term 2018/19 15/21

Bivariate normal model (2) For a bivariate normal model, the conditional distribution of Y given X = x is normal. The model implies that the population regression function, E(Y x), is a linear function of x. Linear regression analysis is almost always carried out using the conditional distribution of (Y 1,..., Y n ) given X 1 = x 1,..., X n = x n, rather than the unconditional distribution of (Y 1, X 1 ),..., (Y n, X n ). Inference based on point estimators, intervals, or tests is the same for the conditional normal model and the bivariate normal model, at least with respect to the parts that are relevant for the present course. Winter term 2018/19 16/21

Regression with errors in variables A more complicated model with stochastic regressors than the bivariate normal model is the measurement error model or errors in variables (EIV) model. We observe independent pairs (Y i, X i ), i = 1,..., n, according to Y i = β 0 + β 1 ξ i + ɛ i, ɛ i N (0, σ 2 ɛ ), X i = ξ i + δ i, δ i N (0, σ 2 δ ). The variables ξ i and η i are sometimes called latent variables. If δ i = 0, then the model becomes simple linear regression. Winter term 2018/19 17/21

Functional and structural relationships There are two different types of relationship that can be specified in the EIV model: one that specifies a functional linear relationship, and one describing a structural linear relationship. The different relationship specifications can lead to different estimators with different properties. Winter term 2018/19 18/21

Linear functional relationship model This is the model as presented on slide 17 where we have random variables X i and Y i, with E(X i ) = ξ i and E(Y i ) = η i and we assume the functional relationship η i = β 0 + β 1 ξ i. The ξ i are fixed, unknown parameters and the ɛ i and δ i are independent. The parameters of interest are β 0 and β 1, and inference on these parameters is made using the joint distribution of ((Y 1, X 1 ),..., (Y n, X n )), conditional on ξ 1,..., ξ n. Winter term 2018/19 19/21

Linear structural relationship model Now we assume that ξ 1,..., ξ n are a random sample from a common population (e.g. ξ i N (ξ, σ 2 ξ )). Thus, conditional on ξ 1,..., ξ n we observe pairs (Y i, X i ) (i = 1,..., n) according to the model as presented on slide 17. As before the ɛ i and δ i are independent, but they are also independent of the ξ i. Inference on β 0 and β 1 is made using the joint distribution of ((Y 1, X 1 ),..., (Y n, X n )), unconditional on ξ 1,..., ξ n. Winter term 2018/19 20/21

Orthogonal least squares Let us try to find the best line through the points (y i, x i ) (i = 1,..., n). If the x i s are measured without error, it makes sense to consider minimization of vertical distances (ordinary least squares). In an EIV model perform orthogonal (total) least squares, that is, find the line that minimizes orthogonal distances. Such a distance measure does not favour the x variable but rather treats both variables equitably. Winter term 2018/19 21/21