Lecture 13 Extra Sums of Squares

Similar documents
Lecture 11 Multiple Linear Regression

Lecture 3: Inference in SLR

Topic 16: Multicollinearity and Polynomial Regression

Chapter 6 Multiple Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Lecture 12 Inference in MLR

Lecture 10 Multiple Linear Regression

Confidence Interval for the mean response

Sections 7.1, 7.2, 7.4, & 7.6

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Multiple Linear Regression

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

STAT 3A03 Applied Regression With SAS Fall 2017

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Lecture 18 Miscellaneous Topics in Multiple Regression

Econometrics. 4) Statistical inference

Ch 2: Simple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Inference for the Regression Coefficient

16.3 One-Way ANOVA: The Procedure

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

STAT5044: Regression and Anova

Ch 3: Multiple Linear Regression

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Topic 20: Single Factor Analysis of Variance

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Variance Decomposition and Goodness of Fit

STA 4210 Practise set 2a

The simple linear regression model discussed in Chapter 13 was written as

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

STA 4210 Practise set 2b

ST Correlation and Regression

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Inference for Regression

STAT 115:Experimental Designs

Inferences for Regression

STOR 455 STATISTICAL METHODS I

MATH 644: Regression Analysis Methods

Lecture 6 Multiple Linear Regression, cont.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Lecture 1 Linear Regression with One Predictor Variable.p2

Simple Linear Regression: One Qualitative IV

STA 6167 Exam 1 Spring 2016 PRINT Name

The Multiple Regression Model

Linear models and their mathematical foundations: Simple linear regression

1 Use of indicator random variables. (Chapter 8)

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Chapter 2 Inferences in Simple Linear Regression

Unbalanced Data in Factorials Types I, II, III SS Part 1

F Tests and F statistics

Chapter 12 - Lecture 2 Inferences about regression coefficient

Mathematics for Economics MA course

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Correlation Analysis

General Linear Model (Chapter 4)

Lecture 11: Simple Linear Regression

PLS205 Lab 2 January 15, Laboratory Topic 3

Biostatistics 380 Multiple Regression 1. Multiple Regression

Simple Linear Regression: One Quantitative IV

In Class Review Exercises Vartanian: SW 540

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

F-tests and Nested Models

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

STAT 350: Summer Semester Midterm 1: Solutions

Statistics 512: Applied Linear Models. Topic 4

Multiple Regression Examples

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

df=degrees of freedom = n - 1

Concordia University (5+5)Q 1.

Topic 28: Unequal Replication in Two-Way ANOVA

Multiple Linear Regression

Factorial designs. Experiments

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

School of Mathematical Sciences. Question 1

STAT Chapter 10: Analysis of Variance

Topic 7: Analysis of Variance

Lecture 9 SLR in Matrix Form

This document contains 3 sets of practice problems.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Topic 14: Inference in Multiple Regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Basic Business Statistics 6 th Edition

Statistics and Quantitative Analysis U4320

Transcription:

Lecture 13 Extra Sums of Squares STAT 512 Spring 2011 Background Reading KNNL: 7.1-7.4 13-1

Topic Overview Extra Sums of Squares (Defined) Using and Interpreting R 2 and Partial-R 2 Getting ESS and Partial-R 2 from SAS General Linear Test (Review Section 2.8) Testing single k 0 Testing several k 0 Other General Linear Tests 13-2

Extra Sums of Squares ESS measure the marginal reduction in the error sum of squares from the addition of a group of predictor variables to the model. Examples,, SSR X1 X2 X 3 is the total variation explained by X 1 X 2, and X 3 in a model SSR X1 X 2 is the additional variation explained by X 1 when added to a model already containing X 2 SSR X1, X4 X2, X 3 is the additional variation explained by X 1 and X 4 when added to a model already containing X 2 and X 3 13-3

Extra Sums of Squares (2) Can also view in terms of SSE s ESS represents the part of the SSE that is explained by an added group of variables that was not previously explained by the rest. Examples SSRX X SSE X SSE X, X 1 2 2 1 2 SSRX, X X, X SSE X, X 1 4 2 3 2 3 SSE X, X, X, X 1 2 3 4 13-4

Extra Sums of Squares (3) SSR(X 1,X 4 X 2,X 3 ) SS TOT SSR(X 2,X 3 ) 13-5

Decomposition of SSR (TYPE I) Regression SS can be partitioned into pieces (in any order): SSR X, X, X, X SSR X 1 2 3 4 1 SSR X SSR X SSR X X 2 1 X, X 3 1 2 X, X, X 4 1 2 3 This particular breakdown is called TYPE I sums of squares (variables added in order). 13-6

Extended ANOVA Table Row for Model or Regression becomes p 1 rows, in terms of Type I SS and MS. SOURCE DF Sum of Sq Mean Square X1 1 SSR(X1) MSR(X1) X2 1 SSR(X2 X1) MSR(X2 X1) X3 1 SSR(X3 X1,X2) MSR(X3 X1,X2) ERROR n-4 SSE(X1,X2,X3) MSE(X1,X2,X3) Total n-1 SST Decomposition can be obtained in SAS 13-7

Type III SS Type III sums of squares refers to variables added last. These do NOT add to the SSR. SSR X SSR X SSR X SSR X X, X, X 1 2 3 4 X, X, X 2 1 3 4 X, X, X 3 1 2 4 X, X, X 4 1 2 3 Also can be obtained from SAS; Type III SS leads to variable-added-last tests. 13-8

Getting ESS from SAS New Procedure: GLM (stands for general linear model) GLM is quite similar to REG, but can handle ANOVA when we get there Computer Science Example proc glm data=cs; model gpa = hsm hss hse satm satv /clparm alpha=0.05; Note: Output gives way more decimals than needed. OK to cut to reasonable. 13-9

GLM Output Source DF SS MS F Value Pr > F Model 5 28.64 5.73 11.69 <.0001 Error 218 106.82 0.49 Total 223 135.46 R-Square Coeff Var Root MSE gpa Mean 0.2114 26.56 0.7000 2.635 Standard output that we are used to. F-test is for the overall model answers question of whether any important variables are involved. 13-10

GLM Output (2) Source DF Type I SS MS F Value Pr > F hsm 1 25.810 25.810 52.67 <.0001 hss 1 1.237 1.237 2.52 0.1135 hse 1 0.665 0.665 1.36 0.2452 satm 1 0.699 0.699 1.43 0.2337 satv 1 0.233 0.233 0.47 0.4915 Type I Variables Added In Order; SS add to SSR on previous slide. F-tests are testing each variable given previous variables already in model 13-11

GLM Output (3) Source DF Type III SS MS F Value Pr > F hsm 1 6.772 6.772 13.82 0.0003 hss 1 0.442 0.442 0.90 0.3432 hse 1 0.957 0.957 1.95 0.1637 satm 1 0.928 0.928 1.89 0.1702 satv 1 0.233 0.233 0.47 0.4915 Type III Variables Added Last F-tests are testing variables given that all of the other variables already in model 13-12

Coefficients of Partial Determination Recall: R 2 is the coefficient of determination, and may be interpreted as the percentage of the total variation that has been explained by the model. Example: R 2 = 0.87 means 87% of the Total SS has been explained by the regression model (of however many variables) Can also consider the amount of remaining variation explained by a variable given other variables already in the model this is called partial determination. 13-13

Coef. of Partial Determination (2) 2 Notation: R Y 1 23 represents the percentage of the leftover variation in Y (after regressing on X 2 and X 3 ) that is explained by X 1. Mathematically, 2 SSE X2, X3SSE X1, X2, X3 RY 1 23 SSE X, X SSR X X, X 1 2 3 SSE X, X 2 3 2 3 Subscripts after bar ( ) represent variables already in model. 13-14

Example Suppose total sums of squares is 100, and X 1 explains 60. Of the remaining 40, X 2 then explains 20, and of the remaining 20, X 3 explains 5. Then R R 2 Y 2 1 2 Y 3 12 20 0.50 40 5 0.25 20 13-15

Coefficient of Partial Correlation Square Root of the coefficient of partial determination Given plus/minus sign according to the corresponding regression coefficient Can be useful in model selection (Chapter 9); but no clear interpretation like R 2. Notation: r Y 1 23 13-16

Getting Partial R 2 from SAS PROC REG can produce these, along with sums of squares (in REG, the TYPE III SS are actually denoted as TYPE II there is no difference between the two types for normal regression, but there is for ANOVA so we ll discuss this later) CS Example proc reg data=cs; model gpa = hsm hss hse satm satv /ss1 ss2 pcorr1 pcorr2; 13-17

REG Output Squared Squared Partial Partial Variable DF SS(I) SS(II) Corr(I) Corr(II) Intercept 1 1555 0.327.. hsm 1 25.8 6.772 0.19053 0.05962 hss 1 1.2 0.442 0.01128 0.00412 hse 1 0.7 0.957 0.00614 0.00888 satm 1 0.7 0.928 0.00648 0.00861 satv 1 0.2 0.233 0.00217 0.00217 Example: HSE explains 0.6% of remaining variation after HSM and HSS in model 13-18

REG Output (2) Can get any partial coefficient of determination that we want, but may have to rearrange model to do it. Example: If we want HSE given HSM, we would need to list variables HSM and HSE as the first and second in the model Can get any desired Type I SS in the same way. 13-19

REG Output (3) Variable DF SS(I) SS(II) Corr(I) Corr(II) Intercept 1 1555 0.327.. hsm 1 25.8 6.772 0.19053 0.05962 hse 1 1.5 0.957 0.01362 0.00412 Interpretation: Once HSM is in the model, of the remaining variation (SSE=109) HSE explains only 1.36% of it. 13-20

General Linear Test Compare two models: Full Model: All variables / parameters Reduced Model: Apply NULL hypothesis to full model. Example: 4 variables, H 0 : 2 3 0 FULL: REDUCED: F-statistic is F Y X X X X 0 1 1 2 2 3 3 4 4 Y X X 0 1 1 4 4 SSE R SSE F df df - R F SSE F df F 13-21

General Linear Test (2) Numerator of F-test is the difference in SSE s, or the EXTRA SS associated to the added variables; divided by number of variables being added (d.f.) Denominator is MSE for full model. For the example, test statistic will be F SSR X, X X, X 2 2 3 1 4 MSE X, X, X, X 1 2 3 4 Compare to F-distribution on 2 and n 5 degrees of freedom. 13-22

Alternative Hypotheses Alternative is simply that the null is false. Most of the time, the alternative will be that at least one of the variables in the null group is important. Often looking to fail to reject when performing a test like this our goal is to eliminate unnecessary variables. This means POWER / sample size must be a consideration! If our sample size is too small, we may incorrectly remove variables. 13-23

CS Example Test whether HSS, SATM, SATV as a group are important when added to model containing HSM and HSE. SSE for HSM/HSE model is 108.16 on 221 degrees of freedom SSE for full model is 106.82 on 218 degrees of freedom; MSE is 0.49 F statistic is F 108.16 106.82 / 3 0.49 0.91 13-24

CS Example (2) F < 1 so no need to even look up a value; fail to reject. With 224 data points, we likely have the power required to conclude that the three variables are not useful in the model that already contains HSM and HSE. Can obtain this test in SAS using a test statement: proc reg data=cs; model gpa = hsm hss hse satm satv; TEST1: test hss=0, satm=0, satv=0; 13-25

Test TEST1 Results TEST output Mean Source DF Square F Value Pr > F Numerator 3 0.44672 0.91 0.4361 Denominator 218 0.49000 P-value is 0.4361 (as long as its > 0.1 and the sample size is reasonably large, we can discard the additional variables) 13-26

CS Example (3) Can also obtain the numbers we need from TYPE I / III Sums of Squares. Consider Slide 12-18 and 12-19. How would we test... Importance of HSS in addition to rest. Importance of SAT s added to HS s Importance of HSE after HSM/HSS Importance of HSE after HSM Can obtain the numbers you need for any partial-f test by arranging the variables correctly. 13-27

Upcoming in Lecture 14. More Examples with General Linear Tests Diagnostics and Remedial Measures 13-28