Least Squares Estimation

Similar documents
Multivariate Linear Regression Models

Other hypotheses of interest (cont d)

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Profile Analysis Multivariate Regression

Chapter 7, continued: MANOVA

Lecture 6 Multiple Linear Regression, cont.

Multivariate Statistical Analysis

Ch 3: Multiple Linear Regression

Ch 2: Simple Linear Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate analysis of variance and covariance

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

Multiple Linear Regression

Multiple Linear Regression

Analysis of variance, multivariate (MANOVA)

Simple Linear Regression

Stevens 2. Aufl. S Multivariate Tests c

STAT 730 Chapter 5: Hypothesis Testing

Lecture 5: Hypothesis tests for more than one sample

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

MULTIVARIATE ANALYSIS OF VARIANCE

Multivariate Analysis - Overview

Linear models and their mathematical foundations: Simple linear regression

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Applied Multivariate and Longitudinal Data Analysis

STAT 501 EXAM I NAME Spring 1999

Simple Linear Regression

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Simple and Multiple Linear Regression

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Comparisons of Several Multivariate Populations

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Multivariate Data Analysis Notes & Solutions to Exercises 3

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Homoskedasticity. Var (u X) = σ 2. (23)

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Section 4.6 Simple Linear Regression

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Multivariate Linear Models

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

GLM Repeated Measures

Multivariate Analysis of Variance

MS&E 226: Small Data

Multivariate Regression (Chapter 10)

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

BIOS 2083 Linear Models c Abdus S. Wahed

Lecture 4 Multiple linear regression

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Exst7037 Multivariate Analysis Cancorr interpretation Page 1

MATH 644: Regression Analysis Methods

You can compute the maximum likelihood estimate for the correlation

Applied Multivariate Analysis

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

Simple Linear Regression

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

1 The classical linear regression model

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Math 423/533: The Main Theoretical Topics

Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Lecture 2. The Simple Linear Regression Model: Matrix Approach

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

[y i α βx i ] 2 (2) Q = i=1

i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n

Inferences about a Mean Vector

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Heteroskedasticity. Part VII. Heteroskedasticity

General Linear Model: Statistical Inference

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Applied Regression Analysis

General Linear Model (Chapter 4)

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Mean Vector Inferences

Repeated Measures Part 2: Cartoon data

Lecture 1: Linear Models and Applications

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Multivariate Statistical Analysis

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

STAT 3A03 Applied Regression With SAS Fall 2017

CHAPTER 2 SIMPLE LINEAR REGRESSION

Binary choice 3.3 Maximum likelihood estimation

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Chapter 1. Linear Regression with One Predictor Variable

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

STAT 705 Chapter 16: One-way ANOVA

Linear Models and Estimation by Least Squares

8. Hypothesis Testing

Transcription:

Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition into sums of squares and crossproducts can be shown to be: Y Y }{{} T otsscp = Ŷ Ŷ }{{} RegSSCP + }{{} ˆɛ ˆɛ, ErrorSSCP and the error sums of squares and cross-products can be written as ˆɛ ˆɛ = Y Y Ŷ Ŷ = Y [I Z(Z Z) 1 Z ]Y

Properties of Estimators For the multivariate regression model and with Z of full rank r + 1 < n: E(ˆβ) = β, Cov(ˆβ (i), ˆβ (k) ) = σ ik (Z Z) 1, i, k = 1,..., p. Estimated residuals ˆɛ (i) satisfy E(ˆɛ (i) ) = 0 and and therefore E(ˆɛ (i)ˆɛ (k) ) = (n r 1)σ ik, E(ˆɛ) = 0, E(ˆɛ ˆɛ) = (n r 1)Σ. ˆβ and the residuals ˆɛ are uncorrelated.

Prediction For a given set of predictors z 0 = [ ] 1 z 01 z 0r we can simultaneously estimate the mean responses z 0 β for all p response variables as z 0ˆβ. The least squares estimator for the mean responses is unbiased: E(z 0ˆβ) = z 0 β. The estimation errors z 0ˆβ (i) z 0 β (i) and z 0ˆβ (k) z 0 β (k) for the ith and kth response variables have covariances E[z 0 (ˆβ (i) β (i) )(ˆβ (k) β (k) ) z 0 ] = z 0 [E(ˆβ (i) β (i) )(ˆβ (k) β (k) ) ]z 0 = σ ik z 0 (Z Z) 1 z 0.

Prediction A single observation at z = z 0 can also be estimated unbiasedly by z 0ˆβ but the forecast errors (Y 0i z 0ˆβ (i) ) and (Y 0k z 0ˆβ (k) ) may be correlated E(Y 0i z 0ˆβ (i) )(Y 0k z 0ˆβ (k) ) = E(ɛ (0i) z 0 (ˆβ (i) β (i) ))(ɛ (0k) z 0 (ˆβ (k) β (k) )) = E(ɛ (0i) ɛ (0k) ) + z 0 E(ˆβ (i) β (i) )(ˆβ (k) β (k) )z 0 = σ ik (1 + z 0 (Z Z) 1 z 0 ).

Likelihood Ratio Tests If in the multivariate regression model we assume that ɛ N p (0, Σ) and if rank(z) = r + 1 and n (r + 1) + p, then the least squares estimator is the MLE of β and has a normal distribution with E(ˆβ) = β, Cov(ˆβ (i), ˆβ (k) ) = σ ik (Z Z) 1. The MLE of Σ is ˆΣ = 1 nˆɛ ˆɛ = 1 n (Y Z ˆβ) (Y Z ˆβ). The sampling distribution of n ˆΣ is W p,n r 1 (Σ).

Likelihood Ratio Tests Computations to obtain the least squares estimator (MLE) of the regression coefficients in the multivariate multiple regression model are no more difficult than for the univariate multiple regression model, since the ˆβ (i) are obtained one at a time. The same set of predictors must be used for all p response variables. Goodness of fit of the model and model diagnostics are usually carried out for one regression model at a time.

Likelihood Ratio Tests As in the case of the univariate model, we can construct a likelihood ratio test to decide whether a set of r q predictors z q+1, z q+2,..., z r is associated with the m responses. The appropriate hypothesis is H 0 : β (2) = 0, where β = [ β(1),(q+1) p β (2),(r q) p ]. If we set Z = [ Z (1),n (q+1) Z (2),n (r q) ], then E(Y ) = Z (1) β (1) + Z (2) β (2).

Likelihood Ratio Tests Under H 0, Y = Z (1) β (1) +ɛ. The likelihood ratio test consists in rejecting H 0 if Λ is small where Λ = max β (1),Σ L(β (1), Σ) max β,σ L(β, Σ) = L(ˆβ (1), ˆΣ 1 ) L(ˆβ, ˆΣ) = ( ˆΣ ˆΣ 1 ) n/2 Equivalently, we reject H 0 for large values of ( ) ˆΣ n ˆΣ 2 ln Λ = n ln = n ln ˆΣ 1 n ˆΣ + n( ˆΣ 1 ˆΣ).

Likelihood Ratio Tests For large n: [n r 1 1 2 (p r+q+1)] ln ( ˆΣ ˆΣ 1 ) χ 2 p(r q), approximately. As always, the degrees of freedom are equal to the difference in the number of free parameters under H 0 and under H 1.

Other Tests Most software (R/SAS) report values on test statistics such as: 1. Wilk s Lambda: Λ = ˆΣ ˆΣ 0 2. Pillai s trace criterion: trace[(ˆσ ˆΣ 0 )ˆΣ 1 ] 3. Lawley-Hotelling s trace: trace[(ˆσ ˆΣ 0 )ˆΣ 1 ] 4. Roy s Maximum Root test: largest eigenvalue of ˆΣ 0 ˆΣ 1 Note that Wilks Lambda is directly related to the Likelihood Ratio test. Also, Λ = p i=1 (1 + l i), where l i are the roots of ˆΣ 0 l(ˆσ ˆΣ 0 ) = 0.

Distribution of Wilks Lambda Let n d be the degrees of freedom of ˆΣ, i.e. n d = n r 1. Let j = r q + 1, r = pj 2 1, and s = p 2 j 2 4 p 2 +j 2 5, and k = n d 2 1 (p j + 1). Then 1 Λ 1/s ks r Λ 1/s pj F pj,ks r The distribution is exact for p = 1 or p = 2 or for m = 1 or p = 2. In other cases, it is approximate.

Likelihood Ratio Tests Note that (Y Z (1)ˆβ (1) ) (Y Z (1)ˆβ (1) ) (Y Z ˆβ) (Y Z ˆβ) = n( ˆΣ 1 ˆΣ), and therefore, the likelihood ratio test is also a test of extra sums of squares and cross-products. If ˆΣ 1 ˆΣ, then the extra predictors do not contribute to reducing the size of the error sums of squares and crossproducts, this translates into a small value for 2 ln Λ and we fail to reject H 0.

Example-Exercise 7.25 Amitriptyline is an antidepressant suspected to have serious side effects. Data on Y 1 = total TCAD plasma level and Y 2 = amount of the drug present in total plasma level were measured on 17 patients who overdosed on the drug. [We divided both responses by 1000.] Potential predictors are: z 1 = gender (1 = female, 0 = male) z 2 = amount of antidepressant taken at time of overdose z 3 = PR wave measurements z 4 = diastolic blood pressure z 5 = QRS wave measurement.

Example-Exercise 7.25 We first fit a full multivariate regression model, with all five predictors. We then fit a reduced model, with only z 1, z 2, and performed a Likelihood ratio test to decide if z 3, z 4, z 5 contribute information about the two response variables that is not provided by z 1 and z 2. From the output: ˆΣ = 1 17 [ 0.87 0.7657 0.7657 0.9407 ], ˆΣ 1 = 1 17 [ 1.8004 1.5462 1.5462 1.6207 ].

Example-Exercise 7.25 We wish to test H 0 : β 3 = β 4 = β 5 = 0 against H 1 : at least one of them is not zero, using a likelihood ratio test with α = 0.05. The two determinants are ˆΣ = 0.0008 and ˆΣ 1 = 0.0018. Then, for n = 17, p = 2, r = 5, q = 2 we have: (n r 1 1 ( (p r + q + 1)) 2 ln ˆΣ = ˆΣ 1 (17 5 1 1 ( ) 0.0008 2 (2 5 + 2 + 1)) ln 0.0018 ) = 8.92.

Example-Exercise 7.25 The critical value is χ 2 (0.05) = 12.59. Since 8.92 < 2(5 2) 12.59 we fail to reject H 0. The three last predictors do not provide much information about changes in the means for the two response variables beyond what gender and dose provide. Note that we have used the MLE s of Σ under the two hypotheses, meaning, we use n as the divisor for the matrix of sums of squares and cross-products of the errors. We do not use the usual unbiased estimator, obtained by dividing the matrix E (or W ) by n r 1, the error degrees of freedom. What we have not done but should: Before relying on the results of these analyses, we should carefully inspect the residuals and carry out the usual tests.

Prediction If the model Y = Zβ + ɛ was fit to the data and found to be adequate, we can use it for prediction. Suppose we wish to predict the mean response at some value z 0 of the predictors. We know that z 0ˆβ N p (z 0 β, z 0 (Z Z) 1 z 0 Σ). We can then compute a Hotelling T 2 statistic as T 2 = z 0ˆβ z 0 β z 0 (Z Z) 1 z 0 ( n n r 1 ˆΣ ) 1 z 0ˆβ z 0 β z 0 (Z Z) 1 z 0.

Confidence Ellipsoids and Intervals Then, a 100(1 α)% CR for x 0 β is given by all z 0 β that satisfy ( ) (z 0ˆβ z 0 n 1 β) n r 1 ˆΣ (z 0ˆβ z 0 β) z 0 (Z Z) 1 z 0 [( p(n r 1) n r p ) F p,n r p (α) ]. The simultaneous 100(1 α)% confidence intervals for the means of each response E(Y i ) = z 0 β (i), i=1,2,...,p, are z 0ˆβ ( ) (i) ± p(n r 1) ( F p,n r p (α) z 0 n r p (Z Z) 1 n z 0 n r 1ˆσ ii ).

Confidence Ellipsoids and Intervals We might also be interested in predicting (forecasting) a single p dimensional response at z = z 0 or Y 0 = z 0 β + ɛ 0. The point predictor of Y 0 is still z 0ˆβ. The forecast error Y 0 z 0ˆβ = (β z 0ˆβ)+ɛ 0 is distributed as N p (0, (1+z 0 (Z Z) 1 z 0 )Σ). The 100(1 α)% prediction ellipsoid consists of all values of Y 0 such that ( ) (Y 0 z 0ˆβ) n 1 n r 1 ˆΣ (Y0 z 0ˆβ) [( ) ] (1 + z 0 (Z Z) 1 p(n r 1) z 0 ) F p,n r p (α). n r p

Confidence Ellipsoids and Intervals The simultaneous prediction intervals for the p response variables are z 0ˆβ (i) ± ( p(n r 1) n r p ) F p,n r p (α) (1 + z 0 (Z Z) 1 z 0 ) ( n n r 1ˆσ ii where ˆβ (i) is the ith column of ˆβ (estimated regression coefficients corresponding to the ith variable), and ˆσ ii is the ith diagonal element of ˆΣ. ),

Example - Exercise 7.25 (cont d) We consider the reduced model with r = 2 predictors for p = 2 responses we fitted earlier. We are interested in the 95% confidence ellipsoid for E(Y 01, Y 02 ) for women (z 01 = 1) who have taken an overdose of the drug equal to 1,000 units (z 02 = 1000). From our previous results we know that: ˆβ = 0.0567 0.2413 0.5071 0.6063 0.00033 0.00032, ˆΣ = [ 0.1059 0.0910 0.0910 0.0953 ]. SAS IML code to compute the various pieces of the CR is given next.

Example - Exercise 7.25 (cont d) proc iml ; reset noprint ; n = 17 ; p = 2 ; r = 2 ; tmp = j(n,1,1) ; use one ; read all var{z1 z2} into ztemp ; close one ; Z = tmp ztemp ; z0 = {1, 1, 1000} ; ZpZinv = inv(z *Z) ; z0zpzz0 = z0 *ZpZinv*z0 ;

Example - Exercise 7.25 (cont d) betahat = {0.0567-0.2413, 0.5071 0.6063, 0.00033 0.00032} ; sigmahat = {0.1059 0.0910, 0.0910 0.0953}; betahatz0 = betahat *z0 ; scale = n / (n-r-1) ; varinv = inv(sigmahat)/scale ; print z0zpzz0 ; print betahatz0 ; print varinv ; quit ;

Example - Exercise 7.25 (cont d) From the output, we get: z 0 (Z Z) 1 z 0 = 0.100645, [ ] ( ) [ z 0ˆβ 0.8938 n 1 =, 0.685 n r 1 ˆΣ 43.33 41.37 = 41.37 48.15 ]. Further, p(n r 1)/(n r p) = 2.1538 and F 2,13 (0.05) = 3.81. Therefore, the 95% confidence ellipsoid for z 0 β is given by all values of z 0 β that satisfy (z 0 β [ 0.8938 0.685 ] ) [ 43.33 41.37 41.37 48.15 0.100645(2.1538 3.81). ] (z 0 β [ 0.8938 0.685 ] )

Example - Exercise 7.25 (cont d) The simultaneous confidence intervals for each of the expected responses are given by: 0.8938 ± 2.1538 3.81 = 0.8938 ± 0.3259 0.685 ± 2.1538 3.81 = 0.685 ± 0.309, for E(Y 01 ) and E(Y 02 ), respectively. 0.100645 (17/14) 0.1059 0.100645 (17/14) 0.0953 Note: If we had been using s ii (computed as the i, ith element of E over n r 1) instead of ˆσ ii as an estimator of σ ii, we would not be multiplying by 17 and dividing by 14 in the expression above.

Example - Exercise 7.25 (cont d) If we wish to construct simultaneous confidence intervals for a single response at z = z 0 we just have to use (1 + z 0 (Z Z) 1 z 0 ) instead of z 0 (Z Z) 1 z 0. From the output, (1 + z 0 (Z Z) 1 z 0 ) = 1.100645 so that the 95% simultaneous confidence intervals for forecasts (Y 01, Y 02 are given by 0.8938 ± 2.1538 3.81 = 0.8938 ± 1.078 0.685 ± 2.1538 3.81 = 0.685 ± 1.022. 1.100645 (17/14) 0.1059 1.100645 (17/14) 0.0953 As anticipated, the confidence intervals for single forecasts are wider than those for the mean responses at a same set of values for the predictors.

Example - Exercise 7.25 (cont d) ami$gender <- as.factor(ami$gender) library(car) fit.lm <- lm(cbind(tcad, drug) ~ gender + antidepressant + PR + dbp + QRS, data = ami) fit.manova <- Manova(fit.lm) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(>F) gender 1 0.65521 9.5015 2 10 0.004873 ** antidepressant 1 0.69097 11.1795 2 10 0.002819 ** PR 1 0.34649 2.6509 2 10 0.119200 dbp 1 0.32381 2.3944 2 10 0.141361 QRS 1 0.29184 2.0606 2 10 0.178092 ---

Example - Exercise 7.25 (cont d) C <- matrix(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1), newfit <- linearhypothesis(model = fit.lm, hypothesis.matrix = C) Sum of squares and products for the hypothesis: TCAD drug TCAD 0.9303481 0.7805177 drug 0.7805177 0.6799484 Sum of squares and products for error: TCAD drug TCAD 0.8700083 0.7656765 drug 0.7656765 0.9407089

Multivariate Tests: Df test stat approx F num Df den Df Pr(>F) Pillai 3 0.6038599 1.585910 6 22 0.19830 Wilks 3 0.4405021 1.688991 6 20 0.17553 Hotelling-Lawley 3 1.1694286 1.754143 6 18 0.16574 Roy 3 1.0758181 3.944666 3 11 0.03906 * ---

Dropping predictors fit1.lm <- update(fit.lm,.~. - PR - dbp - QRS) Coefficients: TCAD drug (Intercept) 0.0567201-0.2413479 gender1 0.5070731 0.6063097 antidepressant 0.0003290 0.0003243 fit1.manova <- Manova(fit1.lm) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(>F) gender 1 0.45366 5.3974 2 13 0.01966 * antidepressant 1 0.77420 22.2866 2 13 6.298e-05 ***

Predict at a new covariate new <- data.frame(gender = levels(ami$gender)[2], antidepressant = 1) predict(fit1.lm, newdata = new) TCAD drug 1 0.5641221 0.365286 fit2.lm <- update(fit.lm,.~. - PR - dbp - QRS + gender:antidepressa fit2.manova <- Manova(fit2.lm) predict(fit2.lm, newdata = new)

Drop predictors, add interaction term fit2.lm <- update(fit.lm,.~. - PR - dbp - QRS + gender:antidepressa fit2.manova <- Manova(fit2.lm) predict(fit2.lm, newdata = new) TCAD drug 1 0.4501314 0.2600822

anova.mlm(fit.lm, fit1.lm, test = "Wilks") Analysis of Variance Table Model 1: cbind(tcad, drug) ~ gender + antidepressant + PR + dbp + QRS Model 2: cbind(tcad, drug) ~ gender + antidepressant Res.Df Df Gen.var. Wilks approx F num Df den Df Pr(>F) 1 11 0.043803 2 14 3 0.051856 0.4405 1.689 6 20 0.1755 anova(fit2.lm, fit1.lm, test = "Wilks") Analysis of Variance Table Model 1: cbind(tcad, drug) ~ gender + antidepressant + gender:antidep Model 2: cbind(tcad, drug) ~ gender + antidepressant

Res.Df Df Gen.var. Wilks approx F num Df den Df Pr(>F) 1 13 0.034850 2 14 1 0.051856 0.38945 9.4065 2 12 0.003489 ** ---