Linear Regression. 1 Introduction. 2 Least Squares

Similar documents
Simple Linear Regression

Simple and Multiple Linear Regression

Ch 2: Simple Linear Regression

Simple Linear Regression

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Inference in Regression Analysis

Ch 3: Multiple Linear Regression

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Multiple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Lecture 14 Simple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Multivariate Regression (Chapter 10)

Scatter plot of data from the study. Linear Regression

STAT5044: Regression and Anova

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Regression Models - Introduction

Scatter plot of data from the study. Linear Regression

Bayesian Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.

STAT Chapter 11: Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

ECON The Simple Regression Model

Simple Linear Regression

Measuring the fit of the model - SSR

Lecture 6: Linear models and Gauss-Markov theorem

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Well-developed and understood properties

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Math 423/533: The Main Theoretical Topics

Linear models and their mathematical foundations: Simple linear regression

MATH11400 Statistics Homepage

Chapter 14. Linear least squares

1. Simple Linear Regression

Statistical View of Least Squares

Regression - Modeling a response

Y i = η + ɛ i, i = 1,...,n.

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Inference for the Regression Coefficient

Data Analysis and Statistical Methods Statistics 651

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

STA121: Applied Regression Analysis

Review of Econometrics

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

STAT5044: Regression and Anova. Inyoung Kim

Statistics for Engineers Lecture 9 Linear Regression

13 Simple Linear Regression

Simple linear regression

2. Regression Review

Business Statistics. Lecture 10: Correlation and Linear Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Regression and Statistical Inference

Regression Analysis: Basic Concepts

Business Statistics. Lecture 9: Simple Regression

Correlation & Simple Regression

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Simple Linear Regression

Regression Models - Introduction

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

BIOS 2083: Linear Models

Chapter 1. Linear Regression with One Predictor Variable

Linear Regression Model. Badr Missaoui

Simple Linear Regression

Applied Statistics Preliminary Examination Theory of Linear Models August 2017

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Inference for Regression

Lecture 10 Multiple Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Categorical Predictor Variables

Homoskedasticity. Var (u X) = σ 2. (23)

2.1 Linear regression with matrices

11 Hypothesis Testing

Advanced Quantitative Methods: ordinary least squares

Correlation and Regression

WISE International Masters

Linear Regression. Chapter 3

Inference for Regression Inference about the Regression Model and Using the Regression Line

6. Multiple Linear Regression

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Single and multiple linear regression analysis

Weighted Least Squares

Lecture 5: Clustering, Linear Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

REGRESSION ANALYSIS AND INDICATOR VARIABLES

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Introduction to Estimation Methods for Time Series models. Lecture 1

Tutorial 6: Linear Regression

where x and ȳ are the sample means of x 1,, x n

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

The Simple Regression Model. Part II. The Simple Regression Model

STAT Homework 8 - Solutions

Chapter 3: Multiple Regression. August 14, 2018

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Analysing data: regression and correlation S6 and S7

Transcription:

Linear Regression 1 Introduction It is often interesting to study the effect of a variable on a response. In ANOVA, the response is a continuous variable and the variables are discrete / categorical. What if the variables are also continuous? That is, how to examine the relationship between a continuous outcome and one or more continuous variables? For example, what is the relationship between kid s height and the parents? Another example, the relationship between gas mileage and vehicle size. To answer these questions, we introduce linear regression. Linear regression studies the association between a dependent (response) variable and several independent variables. Dependent and independent variables (from wiki) Dependent and independent variables refer to values that change in relationship to each other. The dependent variables are those that are observed to change in response to the independent variables. The independent variables are those that are deliberately manipulated to invoke a change in the dependent variables. Synonyms of dependent variable: response variable, outcome variable Synonyms of independent variable: predictor variable, explanatory variable Linear regression can be use to (1) Describe or quantify realtionship between variables (2) Make inference (3) Predict. 2 Least Squares Consider an independent variable x and a dependent variable y and denote the observed data by (x i,y i ), where i = 1, 2,,n. In the method of least squares, we fit a straight line to the points by choosing the slope and intercept that minimizes S(β 0,β 1 ) = (y i β 0 β 1 x i ) 2 The least squares estimate (LSE) of (β 0,β 1 ) are chosen to minimize the sum of squared 1

2

vertical deviations, or predictor errors. Let s derive the LSE of β 0 and β 1. The pair (ˆβ 0, ˆβ 1 ) that minimizes the sum of squares must satisfy the following equations: S β 0 S β 1 = 2 (y i β 0 β 1 x i ) = 0 = 2 x i(y i β 0 β 1 x i ) = 0 We have y i = nˆβ 0 + ˆβ 1 x i (nȳ = nˆβ 0 + nˆβ 1 x) x i y i = ˆβ 0 x i + ˆβ 1 x 2 i The first equation implies that ˆβ 0 = ȳ ˆβ 1 x Working on the two equations leads to ˆβ 1 = n x iy i ( x i)( y i) n x2 i ( x = n x iy i n 2 xȳ i) 2 n x2 i = n x2 (x i x)(y i ȳ) (x i x) 2 The main results of LSE ˆβ 1 = (x i x)(y i ȳ) (x i x) 2 ˆβ 0 = ȳ ˆβ 1 x Example. Consider the example of the overall ratings of 30 companies. The data can be found from https://stat.ethz.ch/r-manual/r-patched/library/datasets/html/attitude.html The dependent variable (y) is overall rating and the independent variable (x) is score of handling of employee complaints. Using the data, we found n = 30, x = 66.6,ȳ = 64.63, (x i x)(y i ȳ) = 3879.6, (x i x) 2 = 5141.2 3

Using the formula we derived, ˆβ 1 = 3879.6 5141.2 = 0.7546 ˆβ 0 = 64.63 0.7546 66.6 = 14.3763 Try the following code in R require(stats); require(graphics) #to access the data set #the description of the data can be found from #https://stat.ethz.ch/r-manual/r-patched/library/datasets/html/attitude.html #sample size n=dim(attitude)[1] # check the relationship between complaints and rating. Do them look linear? plot(rating~complaints, data=attitude) #Use the least squares we derived in class to find beta1.hat and beta0.hat x=attitude[,2] #the x-variable y=attitude[,1] #the y-variable beta1.hat=sum( (x-mean(x)) * (y-mean(y)) ) / sum( (x-mean(x))^2 ) beta1.hat beta0.hat= mean(y) - beta1.hat * mean(x) beta0.hat #You can also find the LSE by fitting a lienar model using the R function "lm" lm(rating~complaints, data=attitude) The method of least squares provides an approximate relationship between the independent and dependent variables. But how do you know whether or not you can use linear regression to describe the relationship between variables? Is the fitted model reasonable and accurate enough to use? How much do we believe the fitted model? How can we test the hypothesis that β 1 = 0 (there is no linear relationship between x and y)? When β 1 0, we say there is regression effect. 4

3 Simple Linear Regression (regression with one predictor variable) The goal of this section is to study how to fit a straight line to data. 3.1 Statistical Model In the method of least squares, we didn t explicitly model noise. Consequently, we have not addressed the reliability of the slope and intercept in the presence of noise. The standard, which is also the simplest statistical model is y i = β 0 + β 1 x i + ǫ i, i = 1,,n Here ǫ i are independent with mean zero and variance σ 2, i.e., E(ǫ i ) = 0 and V ar(ǫ i ) = σ 2. In this course, we also assume that the errors are normally distributed, i.e., ǫ iid N(0,σ 2 ). The above model contains two components: systematic and random components Random components: 1. Normal error 2. constant variance 3. independent error 3.1.1 Interpretation of regression parameters β 0 and β 1 are model parameters or regression coefficients. For the intercept, we have β 0 = E(y x = 0) For the slope, we have β 1 = E(x x = x +1) E(y x = x ). This says that β 1 is the expected change in y that is associated with 1-unit increase in X. A graphical interpretation of the the regression parameters: 5

3.1.2 Estimation of Coefficients Recall that the sum of squared difference between the observed y i and its fitted value under the simple linear regression is S = (y i β 0 β 1 x i ) 2 = i=n ǫ 2 i We showed that the LSE of the parameters are: ˆβ 0 = ȳ ˆβ 1 x n ˆβ 1 = (x i x)(y i ȳ) (x i x) 2 Theorem 1 Under the assumptions of linear regression model, ˆβ0 and ˆβ 1 are unbiased estimators of β 0 and β 1, respectively. Proof: From the assumptions, Thus, E(y i ) = β 0 + β 1 x i E(ȳ) = 1 E[y i ] = 1 (β0 + β 1 x i ) = β 0 + β 1 x n n E(ˆβ 1 ) = = (xi x)e[y i ȳ] (xi x) 2 (xi x)[β 0 + β 1 x i β 0 β 1 x] (xi x) 2 = β 1 = β 1 (xi x)(x i x) (xi x) 2 E(ˆβ 0 ) = E(ȳ ˆβ 1 x) = (β 0 + β 1 x β 1 x) = β 0 Theorem 2 Under the assumption of linear regression, 6