Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Similar documents
[y i α βx i ] 2 (2) Q = i=1

Regression Steven F. Arnold Professor of Statistics Penn State University

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Lecture 14 Simple Linear Regression

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Covariance and Correlation

Scatter plot of data from the study. Linear Regression

where x and ȳ are the sample means of x 1,, x n

Scatter plot of data from the study. Linear Regression

Properties of the least squares estimates

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

3. Linear Regression With a Single Regressor

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

STA 302f16 Assignment Five 1

Regression diagnostics

11 Hypothesis Testing

General Linear Model: Statistical Inference

STAT 540: Data Analysis and Regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

3 Multiple Linear Regression

Math 423/533: The Main Theoretical Topics

4 Multiple Linear Regression

Multiple Linear Regression

Linear models and their mathematical foundations: Simple linear regression

Simple and Multiple Linear Regression

Ch 2: Simple Linear Regression

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

2.1 Linear regression with matrices

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Topic 16 Interval Estimation

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Simple Linear Regression

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

STAT5044: Regression and Anova. Inyoung Kim

Linear Models and Estimation by Least Squares

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Linear Regression (9/11/13)

Well-developed and understood properties

Lecture 34: Properties of the LSE

The Statistical Property of Ordinary Least Squares

Matrix Approach to Simple Linear Regression: An Overview

ST 740: Linear Models and Multivariate Normal Inference

STAT 100C: Linear models

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Association studies and regression

BIOS 2083 Linear Models c Abdus S. Wahed

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Lecture 6 Multiple Linear Regression, cont.

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Chapter 12 - Lecture 2 Inferences about regression coefficient

Linear Regression. Junhui Qian. October 27, 2014

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Simple Linear Regression

Chapter 14. Linear least squares

Multivariate Regression

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012

Data Mining Stat 588

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Lecture 3: Multiple Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Topic 12 Overview of Estimation

Regression. Oscar García

Applied Econometrics (QEM)

1 Mixed effect models and longitudinal data analysis

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Foundations of Statistical Inference

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Introduction to Simple Linear Regression

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Introduction to Estimation Methods for Time Series models. Lecture 1

Linear Models in Machine Learning

STA 2201/442 Assignment 2

CAS MA575 Linear Models

MATH11400 Statistics Homepage

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Chapter 4. Chapter 4 sections

This paper is not to be removed from the Examination Halls

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

LIST OF FORMULAS FOR STK1100 AND STK1110

CS281A/Stat241A Lecture 17

STAT5044: Regression and Anova. Inyoung Kim

Lecture 4 Multiple linear regression

Multivariate Regression Analysis

Advanced Statistics I : Gaussian Linear Model (and beyond)

HT Introduction. P(X i = x i ) = e λ λ x i

ECON 3150/4150, Spring term Lecture 7

Applied Statistics and Econometrics

Transcription:

Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar

Mean and variance Recall we had defined the Expectation or the mean of a random variable X as i µ = E(X) = x ip(x = x i ) xf (x)dx for discrete X for continuous X with density f (x) and the variance of a random variable X as i σ 2 = Var(X) = E(X 2 ) µ 2 = x2 i P(X = x i) µ 2 x2 f (x)dx µ 2

If X and Y are random variables with means µ X and µ Y, then the covariance of X and Y is defined by Cov (X, Y) = E (X µ X ) (Y µ Y ) = E(XY) µ X µ Y The correlation coefficient ρ (X, Y) of X and Y is defined by ρ(x, Y) = Cov (X, Y) Var (X) Var (Y)

Properties of mean and variance E (ax + b) = aex + b, Var (ax + b) = a 2 Var (X) E (ax + by + c) = aex + bey + c Var (ax + by + c) = a 2 Var (X) + b 2 Var (Y) + 2abCov (X, Y)

It may be shown that for any n n matrix A and n 1 vector b E (AY + b) = AEY + b, Cov (AY + b) = ACov (Y) A T. which is the basic result used in regression.

Hubble s data (1929) In 1929 Edwin Hubble investigated the relationship between distance and radial velocity of extra-galactic nebulae (celestial objects). It was hoped that some knowledge of this relationship might give clues as to the way the universe was formed and what may happen later. His findings revolutionized astronomy and are the source of much research today. Given here is the data which Hubble used for 24 nebulae.

X = Distance (in Megaparsecs) from earth Y = The recession velocity (in km/sec) X Y X Y X Y X Y.032 170.034 290.214 130.263 70.275 185.275 220.45 200.5 290.5 270.63 200.8 300.9 30.9 650.9 150.9 500 1.0 920 1.1 450 1.1 500 1.4 500 1.7 960 2.0 500 2.0 850 2.0 800 2.0 1090 lib.stat.cmu.edu/dasl/datafiles/hubble.html

From this data-set Hubble obtained the relation Recession Velocity = H 0 Distance where H 0 is Hubble s constant thought to be about 75 km/sec/mpc.

Back to Hubble s data 1200 0 Scatterplot of Recession Velocity vs Distance Recession Velocity (km/sec) 1000 800 600 400 200 0 0-200 0.0 0.5 1.0 Distance (megaparsecs) 1.5 2.0

The ML Method for Linear Regression Analysis Scatterplot data: (x 1, y 1 ),..., (x n, y n ) Basic assumption: The x i s are non-random measurements; the y i are observations on Y, a random variable Statistical model: Y i = α + βx i + ɛ i, i = 1,..., n Errors ɛ 1,..., ɛ n : a random sample from N(0, σ 2 ) Parameters: α, β, σ 2 Y i N(α + βx i, σ 2 ): The Y i s are independent The Y i are not identically distributed, because they have differing means

The likelihood function is the joint density function of the observed data, Y 1,..., Y n n [ L(α, β, σ 2 1 ) = exp (Y i α βx i ) 2 ] i=1 2πσ 2 2σ 2 n (Y = (2πσ 2 ) n/2 i α βx i ) 2 exp i=1 2σ 2 Use partial derivatives to maximize L over all α, β and σ 2 > 0 (Wise advice: Maximize ln L) The ML estimators are: n i=1 ˆβ = (x i x)(y i Ȳ) n i=1 (x i x) 2, ˆα = Ȳ ˆβ x and ˆσ 2 = 1 n n (Y i ˆα ˆβx i ) 2 i=1

Using this on Hubble s data we get ˆβ = 454.16, ˆα = 40.78 where the intercept term is not significant. If we use regression through origin, the result is ˆβ = 423.94 The result from the historical data set obtained by Hubble is very different from the current estimate of the Hubble s constant due to various issues of measurement errors and censoring, etc. Those issues will be discussed in a subsequent lecture.

How do we obtain the least squares estimates? Let Y i be the response for the i th data point and let x i be the p-dimensional (row vector) of the predictors for the ith data point, i = 1,, n. There are p-1 predictors. We assume that Y i = x i β + e i, where β, an unknown parameter, is a p 1 column vector, and ( e i N 1 0, σ 2 ), and the e i are independent. Note that σ 2 is another parameter for this model.

We further assume that the predictors are linearly independent. Thus we could have the second predictor be the square of the first predictor, the third one the cube of the first one, etc, so this model includes polynomial regression.

We often write this model in matrices. Let Y = Y 1. Y n, X = x 1. x n, e = e 1. e n so that Y and e are n 1 and X is n p. The assumed linear independence of the predictors implies that the columns of X are linearly independent and hence rank(x) = p. x i = (1, x i,1,..., x i,p 1 ).

The normal model can be stated more compactly as Note that: β = (β 0, β 1,..., β p 1 ) T. Y = Xβ + e, e N n ( 0, σ 2 I ) The input matrix X is of dimension n (p): 1 x 1,1 x 1,2... x 1,p 1 1 x 2,1 x 2,2... x 2,p 1............... 1 x n,1 x n,2... x n,p 1

Therefore, using the formula for the multivariate normal density function, we see that the joint density of Y is f β,σ 2 (y) = (2π) n/2 σ 2 I 1/2 exp{ 1 2 (y Xβ)T ( σ 2 I ) 1 (y Xβ)} = (2π) n/2 ( σ 2) n/2 exp{ 1 2σ 2 y Xβ 2 }

Therefore the likelihood for this model is L Y ( β,σ 2 ) = (2π) n/2 ( σ 2) n/2 exp{ 1 2σ 2 Y Xβ 2 }

Estimation of β First, we note that the assumption on the X matrix implies that X T X is invertible. The ordinary least square (OLS) estimator of β is found by minimizing q (β) = (Y i x i β) 2 = Y Xβ 2 The formula for the OLS estimator of β is β = ( X T X ) 1 X T Y

Note that E β = ( X T X ) 1 X T EY = ( X T X ) 1 X T Xβ = β ) Cov ( β = ( X T X ) 1 X T σ 2 IX ( X T X ) 1 = σ 2 ( X T X ) 1 = σ 2 M Therefore β N p ( β, σ 2 M )

Properties of the OLS estimator 1. (Gauss-Markov) For the non-normal model the OLS estimator is the best linear unbiased estimator (BLUE), i.e., it has smaller variance than any other linear unbiased estimator. 2. For the normal model, the OLS is the best unbiased estimator i.e., has smaller variance than any other unbiased estimator 3. Typically, the OLS estimator is consistent, i.e. β β

The unbiased estimator of σ 2 In regression we typically estimate σ 2 by σ 2 = Y X β 2 / (n p) which is called the unbiased estimator of σ 2. we first state the distribution of σ 2. (n p) σ 2 χ 2 n p independently of β σ 2

Properties of σ 2 1. For the general model σ 2 is unbiased. 2. For the normal model σ 2 is the best unbiased estimator. 3. σ 2 is consistent.

Interval estimators and tests We first discuss inference about β i the ith component of β. Note that β i the ith component of the OLS estimator is the estimator of β i. Further ) Var ( βi = σ 2 M ii which implies that that the standard error of β i is σ βi = σ M ii Therefore we see that a 1 α confidence interval for β i is β i ( β i t α/2 n p σ βi, β i + t α/2 n p σ βi ).

To test the null hypothesis β i = c against one and two-sided alternatives we use the t-statistic t = β i c σ βi t n p.

Now consider inference for δ = a T β, let δ = a T ( β N1 δ, σ 2 a Ma ) therefore we see that δ is the estimator of δ, and ) Var ( δ = σ 2 a Ma so that the standard error of δ is = σ a σ δ Ma

Therefore the confidence interval for δ is δ ( δ t α/2 n p σ δ, δ + t α/2 n p σ δ) and the test statistic for testing δ = c is given by δ c σ δ t n p under the null hypothesis There are tests and confidence regions for vector generalizations of these procedures.

Let x 0 be a row vector of predictors for a new response Y 0. Let µ 0 = x 0 β = EY 0. µ 0 = x 0 β is the obvious estimator of µ0 and Var ( µ 0 ) = σ 2 x 0 Mx 0 σ µ0 = σ x 0 Mx 0 and therefore a confidence interval for µ 0 is µ 0 ( µ 0 t α/2 n p σ µ 0, µ 0 + t α/2 n p σ µ 0 )

A 1 α prediction interval for Y 0 is an interval such that P (a (Y) Y 0 b (Y)) = 1 α A 1 α prediction interval for Y 0 is Y 0 ( µ 0 t α/2 n p σ 2 + σ 2 µ 0, µ 0 + t α/2 n p σ 2 + σ 2 µ 0 ) The derivation of this interval is based on the fact that Var (Y 0 µ 0 ) = σ 2 + σ 2 µ 0

R 2 Let T 2 = ( Yi Y ) 2, S 2 = Y X β 2 be the numerators of the variance estimators for the regression model and the intercept only model. We think of these as measuring the variation under these two models. Then the coefficient of determination R 2 is defined by Note that R 2 = T2 S 2 T 2 0 R 2 1

Note that T 2 S 2 is the amount of variation in the intercept only model which has been explained by including the extra predictors of the regression model and R 2 is the proportion of the variation left in the intercept only model which has been explained by including the additional predictors.