STAT5044: Regression and Anova. Inyoung Kim

Similar documents
Regression Models - Introduction

Regression Models - Introduction

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1 Linear Regression with One Predictor

STAT5044: Regression and Anova

Chapter 1. Linear Regression with One Predictor Variable

Simple Linear Regression

Ch 2: Simple Linear Regression

Measuring the fit of the model - SSR

Applied Econometrics (QEM)

Simple and Multiple Linear Regression

TMA4255 Applied Statistics V2016 (5)

Lecture 14 Simple Linear Regression

Simple Linear Regression

where x and ȳ are the sample means of x 1,, x n

Statistical View of Least Squares

Inference for Regression

Inference in Regression Analysis

Chapter 14 Simple Linear Regression (A)

Multivariate Regression (Chapter 10)

1. Simple Linear Regression

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Mathematics for Economics MA course

Statistics for Engineers Lecture 9 Linear Regression

Well-developed and understood properties

Inferences for Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Simple Linear Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Categorical Predictor Variables

STA121: Applied Regression Analysis

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.

STAT 4385 Topic 03: Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Multivariate Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Simple Linear Regression Analysis

Chapter 2 Multiple Regression I (Part 1)

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Chapter 12 - Lecture 2 Inferences about regression coefficient

Simple Linear Regression

Applied Econometrics (QEM)

Scatter plot of data from the study. Linear Regression

Advanced Quantitative Methods: ordinary least squares

Simple linear regression

Simple Linear Regression for the MPG Data

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Estadística II Chapter 4: Simple linear regression

Regression Analysis Chapter 2 Simple Linear Regression

Lecture 10 Multiple Linear Regression

Correlation Analysis

Introduction to Simple Linear Regression

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Formal Statement of Simple Linear Regression Model

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Linear Models and Estimation by Least Squares

STAT 100C: Linear models

STAT 540: Data Analysis and Regression

Applied Regression Analysis

Simple Linear Regression

An Introduction to Bayesian Linear Regression

Ch 3: Multiple Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

STAT Chapter 11: Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Math 3330: Solution to midterm Exam

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Inference for Regression Simple Linear Regression

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

The Simple Regression Model. Part II. The Simple Regression Model

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Making sense of Econometrics: Basics

Simple Linear Regression. Part

Statistical Techniques II EXST7015 Simple Linear Regression

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

ECON The Simple Regression Model

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

9. Linear Regression and Correlation

27. SIMPLE LINEAR REGRESSION II

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Linear Regression Model. Badr Missaoui

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

The Multiple Regression Model

Weighted Least Squares

Lecture 34: Properties of the LSE

Homework 2: Simple Linear Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Master s Written Examination

BNAD 276 Lecture 10 Simple Linear Regression Model

Regression Analysis for Undergraduates. By Dr. Anju Gupta. Ms. Shweta Jain

Linear Regression. 1 Introduction. 2 Least Squares

Multiple Linear Regression

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Correlation and Regression

Transcription:

STAT5044: Regression and Anova Inyoung Kim

2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators: Gauss-Markov theorem

3 / 47 Regression A way to model the relationship between dependent ( ) variable Y and independent ( ) variable X.

4 / 47 Regression A way to model the relationship between dependent ( ) variable Y and independent ( ) variable X. The goal of regression is to understand how the values of Y change as X is varied over its range of possible values.

5 / 47 Regression A way to model the relationship between dependent (response) variable Y and independent (explanatory) variable X. The goal of regression is to understand how the values of Y change as X is varied over its range of possible values and also predict Y using X. It is used to answer questions: Does changing class size affect success of students? Can we predict the time of the next eruption of old Faithful Geyser from the length of the most recent eruption? Do changes in diet result in changes in cholesterol level, and if so, do the results depend on other characteristic such as age, sex, and amount of exercise?

6 / 47 Regression Simple linear regression Polynomial regression Multiple linear regression let us start from simple linear regression

7 / 47 Simple Linear regression We have one response variable (Y) and one explanatory variable (X). Regression analysis was first developed by Sir Francis Galton Galton had studied the relation between heights of father and sons Galton had noted that the heights of sons of both tall and short fathers appeared to revert or regress to the mean of the group. Galton developed a mathematical description of this regression tendency, the precursor of today s regression models. The term regression persists to this day to describe statistical relations between variables.

8 / 47 Basic concepts in regression A regression model is a formal means of expressing the two essential ingredients of a statistical relation: A tendency of the dependent variable Y to vary with the independent variable in a systematic fashion A scattering of points around the curve of statistical relationship

9 / 47 Basic concepts in regression A regression model is a formal means of expressing the two essential ingredients of a statistical relation: A tendency of the dependent variable Y to vary with the independent variable in a systematic fashion There is a probability distribution of Y for each level of X A scattering of points around the curve of statistical relationship

10 / 47 Basic concepts in regression A regression model is a formal means of expressing the two essential ingredients of a statistical relation: A tendency of the dependent variable Y to vary with the independent variable in a systematic fashion There is a probability distribution of Y for each level of X A scattering of points around the curve of statistical relationship The means of these probability distributions vary in some systematic fashion with X.

11 / 47 What might be of interest in regression? Regression is a statistical method to estimate relationship between a response and explanatory variable using linear model

12 / 47 What might be of interest in regression? Regression is a statistical method to estimate relationship between a response and explanatory variable using linear model Is there a linear relationship? How to describe the relationship How to predict a new value How to predict the value of the explanatory variable that causes a specified response

13 / 47 Simple linear regression Model Y i = β 0 + β 1 X i + ε i, i = 1,...,n Y: response variable/dependent variable X: explanatory variable/independent variable ε i : random error with mean E(ε i ) = 0 and variance Var(ε i ) = σ 2, and covariance COV(ε i,ε j ) = 0

14 / 47 Simple linear regression Model Y i = β 0 + β 1 X i + ε i, i = 1,...,n Y: response variable/dependent variable X: explanatory variable/independent variable ε i : random error with mean E(ε i ) = 0 and variance Var(ε i ) = σ 2, and covariance COV(ε i,ε j ) = 0 There is one more random variable. What is it?

15 / 47 Simple linear regression Model Y i = β 0 + β 1 X i + ε i, i = 1,...,n Y: response variable/dependent variable X: explanatory variable/independent variable ε i : random error with mean E(ε i ) = 0 and variance Var(ε i ) = σ 2, and covariance COV(ε i,ε j ) = 0 There is one more random variable. What is it? Y E(Y i ) =? and Var(Y i ) =?

16 / 47 Simple linear regression Model Y i = β 0 + β 1 X i + ε i, i = 1,...,n Y: response variable/dependent variable X: explanatory variable/independent variable ε i : random error with mean E(ε i ) = 0 and variance Var(ε i ) = σ 2, and covariance COV(ε i,ε j ) = 0 There is one more random variable. What is it? Y E(Y i ) = β 0 + β 1 X i and Var(Y i ) = Var(ε i ) = σ 2

17 / 47 Simple linear regression Model Y i = β 0 + β 1 X i + ε i, i = 1,...,n E(Y i ) = β 0 + β 1 X i and Var(Y i ) = Var(ε i ) = σ 2 β 0, β 1 : regression coefficient parameters. β 1 : the slope of the regression line which indicates the change in the mean of the probability distribution of Y per unit increase in X. β 0 : the intercept of the regression line. If the slope of the model includes X = 0, β 0 gives the mean of the probability distribution of Y at X = 0. What are known values?

18 / 47 Simple linear regression Model Y i = β 0 + β 1 X i + ε i, i = 1,...,n E(Y i ) = β 0 + β 1 X i and Var(Y i ) = Var(ε i ) = σ 2 β 0, β 1 : regression coefficient parameters. β 1 : the slope of the regression line which indicates the change in the mean of the probability distribution of Y per unit increase in X. β 0 : the intercept of the regression line. If the slope of the model includes X = 0, β 0 gives the mean of the probability distribution of Y at X = 0. What are known values? What are unknown values?

19 / 47 Simple regression model To estimate the linear relationship between Y and X, what do we need to do?

20 / 47 Simple regression model To estimate the linear relationship between Y and X, what do we need to do? How to estimate unknown?

21 / 47 Simple regression model Goal: fit a straight line to points on a scatterplot; find intercept and slope such that ŷ i = b 0 + b 1 x i fits the data as well as possible. find b 0 and b 1 using least square estimation method (LSE) find b 0 and b 1 to minimize i e 2 i, Notation: residual e i = y i ŷ i

22 / 47 Notation and Definition Fitted value: Ŷ i = b 0 + b 1 X i b 0 = ˆβ 0, b 1 = ˆβ 1 Residual: e i = Y i Ŷ i S xy : i (x i x)(y i ȳ) S xx : i (x i x) 2 S yy : i (y i ȳ) 2

23 / 47 Notation and Definition Fitted value: Ŷ i = b 0 + b 1 X i b 0 = ˆβ 0, b 1 = ˆβ 1 Residual: e i = Y i Ŷ i S xy : i (x i x)(y i ȳ) S xx : i (x i x) 2 S yy : i (y i ȳ) 2 what is the difference between residual (e i ) and error (ε i )?

24 / 47 Interpreting a regression line or least squares line: 1.The slope of the line estimates the average increase in y for each one unit increase in x. 2.The intercept of the line is the value of y when x=0, but interpreting the intercept in the context of data only makes sense if 0 is included in the range of measured x-values. 3. Estimates the average y for specific value of x. It also can be as a prediction of the value of y for an individual with a specific value of x.

25 / 47 Interpreting a regression line or least squares line: Note: regression line is created based on the least squares criterion: When we use a line to predict the values of y, the sum of squared differences between the observed values of y and the predicted value is smaller for the least squares line than it is for any other line.

26 / 47 How to estimate parameters The method of Least Square ( or method of Ordinary Least Square): Estimate β 0 and β 1 to minimize Q = (Y i β 0 β 1 X i ) 2 i The values of β 0 and β 1 that minimize Q can be derived by differentiating Q with respect to β 0 and β 1 Q β 0 = Q β 1 = We then set these partial derivatives equal to zeros using b 0 and b 1 to denote the particular values of β 0 and β 1, respectively, that minimize Q.

27 / 47 How to estimate parameters We then finally obtain the following equation, call normal equation, Y i = nb 0 + b 1 X i X i Y i = b 0 i X i + b 1 X 2 i

28 / 47 Least Square Estimators ˆβ 0 = b 0 = Ȳ b 1 X ˆβ 1 = b 1 = (X i X)(Yi Ȳ )) (X i X) 2

29 / 47 Notation and Definition Fitted value: Ŷ i = b 0 + b 1 X i Residual: e i = Y i Ŷ i S xy : i (x i x)(y i ȳ) S xx : S yy :

30 / 47 Notation and Definition Fitted value: Ŷ i = b 0 + b 1 X i Residual: e i = Y i Ŷ i S xy : i (x i x)(y i ȳ) S xx : i (x i x) 2 S yy : i (y i ȳ) 2

31 / 47 Least Square Estimators ˆβ 0 = b 0 = Ȳ b 1 X = Ȳ ˆβ 1 X ˆβ 1 = b 1 = (X i X)(Yi Ȳ )) = S xy (X i X) 2 S xx

32 / 47 Properties of Least Squares Estimators Gauss-Markov theorem: Under the conditions of regression model, the least squares estimators b 0 and b 1 are unbiased and have minimum variance among all unbiased linear estimators. This theorem means that NOTE: unbiased estimator= a statistic is an unbiased estimation of the parameter if its expectation equals the parameter. i.e., E(b 0 ) = β 0 and E(b 1 ) = β 1

33 / 47 Properties of Least Squares Estimators Gauss-Markov theorem: Under the conditions of regression model, the least squares estimators b 0 and b 1 are unbiased and have minimum variance among all unbiased linear estimators. This theorem means that Among all linear estimators that are unbiased, b 0 and b 1 have the smallest variability in repeated samples in which the X levels remain unchanged. NOTE: unbiased estimator: E(b 0 ) = β 0 and E(b 1 ) = β 1

34 / 47 Properties of Fitted Regression Line E(Ŷ ) = β 0 + β 1 X = E(Y ) i e i = 0 Y i = Ŷi

35 / 47 Estimation of σ 2 Define SSE = e 2 i i = (y i ŷ i ) 2 i which is a sum of squares error (SSE) and has n-2 (why??) degree of freedom SSE/(n 2) is an unbiased estimator of σ 2 Notation: MSE = SSE/n 2 which call mean squares error.

36 / 47 Simple linear regression with normal error assumption Model Y i = β 0 + β 1 X i + ε i Y i : response variable/dependent variable X i : explanatory variable/independent variable ε i : are iid N(0, σ 2 ), i=1,...,n, COV(ε i,ε j ) = 0 β 0, β 1 : regression coefficient parameters. β 1 : the slope of the regression line which indicates the change in the mean of the probability distribution of Y per unit increase in X. β 0 : the Y intercept of the regression line. If the slope of the model includes X = 0, β 0 gives the mean of the probability distribution of Y at X = 0. Y i N(, ) Y i (β 0 + β 1 X i ) N(, )

Simple linear regression with normal error assumption Model Y i = β 0 + β 1 X i + ε i Y i : response variable/dependent variable X i : explanatory variable/independent variable ε i : are iid N(0, σ 2 ), i=1,...,n, COV(ε i,ε j ) = 0 β 0, β 1 : regression coefficient parameters. β 1 : the slope of the regression line which indicates the change in the mean of the probability distribution of Y per unit increase in X. β 0 : the Y intercept of the regression line. If the slope of the model includes X = 0, β 0 gives the mean of the probability distribution of Y at X = 0. Y i N(, ) Y i (β 0 + β 1 X i ) N(, ) Question: how to estimate β 0 and β 1? 37 / 47

38 / 47 Simple linear regression with normal error assumption We estimate β 0 and β 1 using maximum likelihood estimation (MLE). How to calculate MLE?

39 / 47 Simple linear regression with normal error assumption We estimate ˆβ 0 and β 1 using maximum likelihood estimation (MLE). How to calculate MLE? calculate Likelihood function Take the first derivative of the likelihood function with respect to β 0 and β 1 Check wheter the second derivative of the likelihood function is less than zero.

40 / 47 Maximum Likelihood Estimation Likelihood function is the function of parameters given data. Likelihood function L(,, ) given the sample observation Y 1,...,Y n L(,, ) = n i=1 1 (2πσ 2 ) exp[ (Y i β 0 β X i ) 2 ] 2σ 2

41 / 47 Maximum Likelihood Estimation Log-Likelihood is

42 / 47 MLE and LSE Parameter LSE MLE Normal Assumption. β 0 β 1 σ 2 SSE? SSE?

43 / 47 MLE and LSE Parameter LSE MLE Normal Assumption. NO Yes β 0 = β 1 = σ 2 SSE SSE n 2 n unbiased biased

What assumptions we have in simple linear regression 44 / 47

45 / 47 What assumptions we have in simple linear regression Independent

46 / 47 What assumptions we have in simple linear regression Independent Equal variance

47 / 47 What assumptions we have in simple linear regression Independent Equal variance Normal assumption to make inference including test and confidence interval