EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Size: px

Start display at page:

Download "EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :""

Lora Evans
6 years ago
Views:

1 EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically identical to the previous test of H!:"" =0 done with the t-test (see demonstration in text, which only demonstrates that F=t for H :" =0). The probabilities are identical.! " More generally, the first column in the F tables (F square of t. ie. t! œ F #.0 à "ß #.0 à! # "ß./8.0 ) is equivalent to the Text mentions that t-test has the advantage that it can test one-tailed hypotheses, while the F cannot. Also, the t readily tests hypotheses other than " " œ 0. This can also be done with a non-central F test, but this is more difficult. SAS has a test statement in PROC REG which produces an F value for testing values other than 0, but you should know that: 1) the t-test you would do is the same, and ) The SAS test is a two tailed test. The t-test can be either one or twotailed ) The P value (P>F) given by SAS for the F value from the TEST" statement is exactly the same as it would be for the t-test.

2 EXST704 - Regression Techniques Page Expected Mean Squares for Regression Recall from ANOVA (simple CRD, balanced) that E(MSE) œ 5 7 E(MSTreatments) = 5 n5 7 where 5 is the residual variance, 5 is the treatment variance and n is the number of replicates in each treatment and the quantity that we wanted to test was The test used was F œ 5 n 5 We can see that 5 1) F will be 1 if 57 œ 0. This would be the null hypothesis ) Power (the ability to detect a difference which exists) increases as we increase n (sample size) or 57 (the treatment differences) or as we reduce 5 (the random error term). Likewise for regression, E(MSE) œ 5 (this is deviations from regression) E(MSRegression) = 5 " D(X X) The test used was F œ 5 " D 5 " " (X X) 1) F will be 1 if " œ 0. This would be the null hypothesis. Also, since " is squared, this will be a two tailed test. For one tailed tests use the t- test. " ) Power (the ability to detect a difference which exists) increases as we increase " (regression coefficient) or D(X X) (the corrected SS of X ) or as we reduce (the random error term). 5

3 EXST704 - Regression Techniques Page Note that power increases as D(X X) increases, this occurs as we a) increase the distance from X to X. (Were is best place to put X?) Y i Y i X i Xi but only if we know that it is a straight line b) increase n, since more squared differences are added D(X X) Also note that the term "D " (X X) will be positive since " is squared and the SSX will be positive. Therefore, this test is one tailed.

4 EXST704 - Regression Techniques Page 4 EXAMPLE: Using SAS to test hypotheses about "! and "" EXST704 - EXAMPLE 1 Program Statements *************************************************************; *** EXST704 Example 1 using PC-SAS ***; *** Problem from Neter, Wasserman & Kuttner 1989,.19 ***; *************************************************************; OPTIONS LS=80 PS=61 NOCENTER NODATE NONUMBER; DATA ONE; INFILE CARDS MISSOVER; TITLE1 'EXST704 - EXAMPLE 1'; INPUT X Y; CARDS; raw data here ; PROC SORT; BY X Y; PROC PRINT; TITLE 'Raw Data Listing'; PROC REG; procedure'; MODEL Y = X / XPX I P CLM; TITLE 'Regression Models done with SAS REG TEST X = 5; RUN; Model: MODEL1 w w w Model Crossproducts X X X Y Y Y w X X INTERCEP X Y INTERCEP X Y w X X Inverse, Parameter Estimates, and SSE INTERCEP X Y INTERCEP X Y EXST704 - EXAMPLE 1 Regression Models done with SAS REG procedure Dependent Variable: Y Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total Root MSE R-square Dep Mean Adj R-sq C.V

5 EXST704 - Regression Techniques Page 5 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP X Note: 8.58 œ Output from the PROC REG TEST" option for TEST X = 5;" Dependent Variable: Y Numerator: DF: 1 F value: Denominator:. DF: 8 Prob>F: Notes: 1) t test of parameter estimate (= 8.58) is equal to the square root of the F test of the model. F œ 7.77 ; ÈF = È7.77 = These are the same test. ) The value for the standard error of b " is Var(b ) = = = = 0. = s " ( X ) 10," n 10 n5 MSE. D n DX - D(X -X) 0 s, " œ È0. œ w -" Which is also equal to the square root of MSE*c from the (X X) matrix, where MSE =. and c 0.1. "" œ ) The value for the standard error of b is! Var(b ) = = = œ 0.44! DX 5 DX MSE 0*. n DX ( DX ) - n D(X-X) 10*10 s, 0 œ È0.44 œ n 4) The TEST option was used to test the hypothesis that H!: "" =5. The alternative would be the two tailed alternative that H ":"" Á 5. The option produced the results: F = , P(>F) = Which should be the square of t, or t = ÈF œ.1. (b " ) b s s " " t =! œ " œ œ œ.1 b" b"

6 EXST704 - Regression Techniques Page 6 EXST704 - EXAMPLE 1 : Vial breakage regressed on number of airline transfers. Example of confidence limits for the regression line at various values of X. A missing value was included with an X value of 4. Regression Models done with SAS REG procedure Dep Var Predict Std Err Lower95% Upper95% Obs Y Value Predict Mean Mean Residual Sum of Residuals E-14 Sum of Squared Residuals Predicted Resid SS (Press) Example of confidence limits for a new point at various values of X. A missing value was included with an X value of 4. Regression Models done with SAS REG procedure Dep Var Predict Std Err Lower95% Upper95% Obs Y Value Predict Predict Predict Residual Sum of Residuals E-14 Sum of Squared Residuals Predicted Resid SS (Press) 5.859

7 EXST704 - Regression Techniques Page 7 Summary of the results due to the assumptions made (a) S = MSE then E(S ) = 5 (b) Distributions w " (1) b is distributed N[ ", 5 (X X) - ] We do not assume Cov( "," j ) = 0 as with the Y's. More later. MSReg MSE () is distributed F (df MSReg, df MSE) For multiple regression this is a joint test, so the distribution has a noncentrality parameter which is zero when "", "#,... " k equals zero. (When H o is true) () In particular b -" Sc È ii is distributed t (df Error) where the c is the Gaussian multiplier from (X X) (c) What if the distribution of Y is not normal? ii w -" 1) If the departure is small, the distribution is still reasonably symmetric, then the regression coefficients will be approximately normal and the effect on confidence intervals and tests of hypothesis will be small. ) Even if the departure from normality is great, the regression coefficients have a property called asymptotic normality, such that under most conditions the the distribution approaches normality as the sample size increases. Later we will also discuss transformations which will normalize" the data, aiding in meeting this assumption.

8 EXST704 - Regression Techniques Page 8 Variance of E(Y ) for the simple linear model Y = b! + b " X Sampling Distribution of Y as with the variances of " 's, Y is a linear combination of the Y and is normal E(Y ) = E(Y ) Var(Y ) = 5 1 n + (X X) D(X X) In practice 5 would be estimated by MSE. Note that the variance for Y is very similar to the variance of b!. This is because b is a special case of Y where X =0.! Also note that the value of the numerator of the second term will increase as the distance between X and X increases. This is because the regression line is most stable at X, and uncertainty increases as we get farther from X. Sampling Distribution of Y E(Y) s Y as with the other normally distributed statistics examined, this will follow students t distribution with n- degrees of freedom. The t distribution can be used either for testing an hypothesis about Y or for placing a confidence interval on Y.

9 EXST704 - Regression Techniques Page 9 Example : From vial breakage regressed on number of airline transfers example Place a confidence interval on the regression line for the amount of breakage for transfers. Y 1 (X X) 1 ( 1) 0 s = MSE + n = = D(X X) s œ È1.1 œ Y since t!.06, then # ß).0 œ =.* 10 = 1.1 P(Y t s E(Y) Y! Ÿ Ÿ t! s ) œ 1-! X= " Y # ß8 # X= " # ß8 # Y P(..06* Ÿ E(Y) Ÿ..06*1.0488) œ 1-! P( Ÿ E(Y) Ÿ 4.619) œ 0.95 Check this against the SAS output

Failure Time of System due to the Hot Electron Effect

Failure Time of System due to the Hot Electron Effect of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift