Chapter 2 Supplemental Text Material

Similar documents
ENGI 3423 Simple Linear Regression Page 12-01

Simple Linear Regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Econometric Methods. Review of Estimation

Chapter Two. An Introduction to Regression ( )

Third handout: On the Gini Index

Objectives of Multiple Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

4. Standard Regression Model and Spatial Dependence Tests

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Statistics MINITAB - Lab 5

Chapter 14 Logistic Regression Models

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Maximum Likelihood Estimation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 1: Introduction to Regression

Summary of the lecture in Biostatistics

Module 7: Probability and Statistics

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 1: Introduction to Regression

residual. (Note that usually in descriptions of regression analysis, upper-case

Linear Regression with One Regressor

TESTS BASED ON MAXIMUM LIKELIHOOD

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Chapter 5 Properties of a Random Sample

Multiple Linear Regression Analysis

Chapter 8. Inferences about More Than Two Population Central Values

Lecture Notes Types of economic variables

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Line Fitting and Regression

Functions of Random Variables

9.1 Introduction to the probit and logit models

Lecture 3. Sampling, sampling distributions, and parameter estimation

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

ESS Line Fitting

X ε ) = 0, or equivalently, lim

Probability and. Lecture 13: and Correlation

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

DISTURBANCE TERMS. is a scalar and x i

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Simple Linear Regression and Correlation.

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Lecture Notes Forecasting the process of estimating or predicting unknown situations

ENGI 4421 Propagation of Error Page 8-01

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Lecture 2: Linear Least Squares Regression

Chapter 13 Student Lecture Notes 13-1


UNIT 7 RANK CORRELATION

Sampling Theory MODULE V LECTURE - 14 RATIO AND PRODUCT METHODS OF ESTIMATION

Parameter, Statistic and Random Samples

Simple Linear Regression

Correlation and Simple Linear Regression

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Multiple Choice Test. Chapter Adequacy of Models for Regression

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

CHAPTER VI Statistical Analysis of Experimental Data

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Special Instructions / Useful Data

Analysis of Variance with Weibull Data

Correlation and Regression Analysis

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

22 Nonparametric Methods.

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Part I: Background on the Binomial Distribution

Lecture 1 Review of Fundamental Statistical Concepts

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Point Estimation: definition of estimators

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Lecture 2: The Simple Regression Model

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Elementary Slopes in Simple Linear Regression. University of Montana and College of St. Catherine Missoula, MT St.

Lecture 8: Linear Regression

Simple Linear Regression - Scalar Form

The expected value of a sum of random variables,, is the sum of the expected values:

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Bayes (Naïve or not) Classifiers: Generative Approach

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Chapter 2 Simple Linear Regression

ε. Therefore, the estimate

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

STK4011 and STK9011 Autumn 2016

CHAPTER 2. = y ˆ β x (.1022) So we can write

Lecture 3 Probability review (cont d)

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Transcription:

-. Models for the Data ad the t-test Chapter upplemetal Text Materal The model preseted the text, equato (-3) s more properl called a meas model. ce the mea s a locato parameter, ths tpe of model s also sometmes called a locato model. There are other was to wrte the model for a t-test. Oe possblt s = µ + τ + ε R T =, =,,, where µ s a parameter that s commo to all observed resposes (a overall mea) ad τ s a parameter that s uque to the th factor level. ometmes we call τ the th treatmet effect. Ths model s usuall called the effects model. ce the meas model s = µ + ε R T =, =,,, we see that the th treatmet or factor level mea s µ = µ + τ; that s, the mea respose at factor level s equal to a overall mea plus the effect of the th factor. We wll use both tpes of models to represet data from desged expermets. Most of the tme we wll work wth effects models, because t s the tradtoal wa to preset much of ths materal. However, there are stuatos where the meas model s useful, ad eve more atural. -. Estmatg the Model Parameters Because models arse aturall examg data from desged expermets, we frequetl eed to estmate the model parameters. We ofte use the method of least squares for parameter estmato. Ths procedure chooses values for the model parameters that mmze the sum of the squares of the errors ε. We wll llustrate ths procedure for the meas model. For smplct, assume that the sample szes for the two factor levels are equal; that s = =. The least squares fucto that must be mmzed s L = = = ( µ ) = Now L L = ( µ ) ad = ( µ ) ad equatg these partal dervatves µ = µ = to zero elds the least squares ormal equatos ε

µ = µ = The soluto to these equatos gves the least squares estmators of the factor level meas. The soluto s µ = ad µ = ; that s, the sample averages at leach factor level are the estmators of the factor level meas. Ths result should be tutve, as we lear earl o basc statstcs courses that the sample average usuall provdes a reasoable estmate of the populato mea. However, as we have ust see, ths result ca be derved easl from a smple locato model usg least squares. It also turs out that f we assume that the model errors are ormall ad depedetl dstrbuted, the sample averages are the maxmum lkelhood estmators of the factor level meas. That s, f the observatos are ormall dstrbuted, least squares ad maxmum lkelhood produce exactl the same estmators of the factor level meas. Maxmum lkelhood s a more geeral method of parameter estmato that usuall produces parameter estmates that have excellet statstcal propertes. We ca also appl the method of least squares to the effects model. Assumg equal sample szes, the least squares fucto s L = = = ( µ τ ) ad the partal dervatves of L wth respect to the parameters are = ε L = L L ( µ τ), = ( µ τ),ad = ( µ τ ) µ τ τ = = = Equatg these partal dervatves to zero results the followg least squares ormal equatos: µ + τ + τ = µ + τ = = µ + τ = = = Notce that f we add the last two of these ormal equatos we obta the frst oe. That s, the ormal equatos are ot learl depedet ad so the do ot have a uque soluto. Ths has occurred because the effects model s overparameterzed. Ths

stuato occurs frequetl; that s, the effects model for a expermet wll alwas be a overparameterzed model. Oe wa to deal wth ths problem s to add aother learl depedet equato to the ormal equatos. The most commo wa to do ths s to use the equato τ + τ =. Ths s, a sese, a tutve choce as t essetall defes the factor effects as devatos from the overall mea µ. If we mpose ths costrat, the soluto to the ormal equatos s µ = τ =, =, That s, the overall mea s estmated b the average of all sample observato, whle each dvdual factor effect s estmated b the dfferece betwee the sample average for that factor level ad the average of all observatos. Ths s ot the ol possble choce for a learl depedet costrat for solvg the ormal equatos. Aother possblt s to smpl set the overall mea equal to a costat, such as for example µ =. Ths results the soluto µ = τ =, =, Yet aother possblt s τ =, producg the soluto µ = τ = τ = There are a fte umber of possble costrats that could be used to solve the ormal equatos. A obvous questo s whch soluto should we use? It turs out that t reall does t matter. For each of the three solutos above (deed for a soluto to the ormal equatos) we have µ = µ + τ =, =, That s, the least squares estmator of the mea of the th factor level wll alwas be the sample average of the observatos at that factor level. o eve f we caot obta uque estmates for the parameters the effects model we ca obta uque estmators of a fucto of these parameters that we are terested. We sa that the mea of the th factor level s estmable. A fucto of the model parameters that ca be uquel estmated regardless of the costrat selected to solve the ormal equatos s called a estmable fucto. Ths s dscussed more detal Chapter 3. -3. A Regresso Model Approach to the t-test The two-sample t-test ca be preseted from the vewpot of a smple lear regresso model. Ths s a ver structve wa to thk about the t-test, as t fts cel wth the geeral oto of a factoral expermet wth factors at two levels, such as the golf

expermet descrbed Chapter. Ths tpe of expermet s ver mportat practce, ad s dscussed extesvel subsequet chapters. I the t-test scearo, we have a factor x wth two levels, whch we ca arbtrarl call low ad hgh. We wll use x = - to deote the low level of ths factor ad x = + to deote the hgh level of ths factor. Fgure -3. below s a scatter plot (from Mtab) of the portlad cemet mortar teso bod stregth data from Chapter. Fgure -3. catter plot of teso bod stregth data 8.5 Bod tregth 7.5 6.5 - Factor Level (x) We wll a smple lear regresso model to ths data, sa = β + β x + ε where β ad βare the tercept ad slope, respectvel, of the regresso le ad the regressor or predctor varable s x = ad x =+. The method of least squares ca be used to estmate the slope ad tercept ths model. Assumg that we have equal sample szes for each factor level the least squares ormal equatos are: β = = β = = = The soluto to these equatos s β = β = ( ) Note that the least squares estmator of the tercept s the average of all the observatos from both samples, whle the estmator of the slope s oe-half of the dfferece betwee the sample averages at the hgh ad low levels of the factor x. Below s the output from the lear regresso procedure Mtab for the teso bod stregth data.

Predctor Coef tdev T P Costat 7.343.636 7.86. Factor L.579.6356 9.. =.843 R-q = 8.% R-q(ad) = 8.% Aalss of Varace ource DF M F P Regresso 6.748 6.748 8.98. Resdual Error 8.4544.88 Total 9 8.59 Notce that the estmate of the slope (gve the colum labeled Coef ad the row labeled Factor L above) s.579 ( ) = ( 7. 9 6. 76) ad the estmate of the tercept s 7.343 ( + ) = ( 7. 9 + 6. 76). (The dfferece s due to roudg the maual calculatos for the sample averages to two decmal places). Furthermore, otce that the t-statstc assocated wth the slope s equal to 9., exactl the same value we gave Table - the text. Now smple lear regresso, the t- test o the slope s actuall testg the hpotheses H: β = H: β ad ths s equvalet to testg H :µ = µ. It s eas to show that the t-test statstc used for testg that the slope equals zero smple lear regresso s detcal to the usual two-sample t-test. Recall that to test the above hpotheses smple lear regresso the t-statstc s where xx = ( x x) = t = β σ xx s the corrected sum of squares of the x s. Now our specfc problem, x =, x = ad x = +, so xx =. Therefore, sce we have alread observed that the estmate of σ s ust p, t = β σ = xx ( ) = p p

Ths s the usual two-sample t-test statstc for the case of equal sample szes. -4. Costructg Normal Probablt Plots Whle we usuall geerate ormal probablt plots usg a computer software program, occasoall we have to costruct them b had. Fortuatel, t s relatvel eas to do, sce specalzed ormal probablt plottg paper s wdel avalable. Ths s ust graph paper wth the vertcal (or probablt) scale arraged so that f we plot the cumulatve ormal probabltes (.5)/ o that scale versus the rak-ordered observatos () a graph equvalet to the computer-geerated ormal probablt plot wll result. The table below shows the calculatos for the umodfed portlad cemet mortar bod stregth data. () (.5)/ z () 7.5.5 -.64 7.63.5 -.4 3 7.75.5 -.67 4 7.86.35 -.39 5 7.9.45 -.3 6 7.96.55.3 7 8..65.39 8 8.5.75.67 9 8..85.4 8.5.95.64 Now f we plot the cumulatve probabltes from the ext-to-last colum of ths table versus the rak-ordered observatos from the secod colum o ormal probablt paper, we wll produce a graph that s detcal to Fgure -a the text. A ormal probablt plot ca also be costructed o ordar graph paper b plottg the stadardzed ormal z-scores z () agast the raked observatos, where the stadardzed ormal z-scores are obtaed from. PZ ( z) = ( z) = 5 Φ where Φ( ) deotes the stadard ormal cumulatve dstrbuto. For example, f (.5)/ =.5, the Φ( z) = 5. mples that z = 64.. The last colum of the above table dsplas the values of the ormal z-scores. Plottg these values agast the raked observatos o ordar graph paper wll produce a ormal probablt plot equvalet to Fgure -a. As oted the text, ma statstcs computer packages preset the ormal probablt plot ths wa.

-5. More About Checkg Assumptos the t-test We oted the text that a ormal probablt plot of the observatos was a excellet wa to check the ormalt assumpto the t-test. Istead of plottg the observatos, a alteratve s to plot the resduals from the statstcal model. Recall that the meas model s = µ + ε R T =, =,,, ad that the estmates of the parameters (the factor level meas) ths model are the sample averages. Therefore, we could sa that the ftted model s =, =, ad =,,, That s, a estmate of the th observato s ust the average of the observatos the th factor level. The dfferece betwee the observed value of the respose ad the predcted (or ftted) value s called a resdual, sa e =, =., The table below computes the values of the resduals from the portlad cemet mortar teso bod stregth data. Observato e = = 6. 76 e = = 7. 9 6.85.9 7.5 -.4 6.4 -.36 7.63 -.9 3 7..45 8.5.33 4 6.35 -.4 8..8 5 6.5 -.4 7.86 -.6 6 7.4.8 7.75 -.7 7 6.96. 8..3 8 7.5.39 7.9 -. 9 6.59 -.7 7.96.4 6.57 -.9 8.5.33 The fgure below s a ormal probablt plot of these resduals from Mtab.

Normal Probablt Plot of the Resduals (respose s Bod tr) Normal core - - -.5. Resdual.5 As oted secto -3 above we ca compute the t-test statstc usg a smple lear regresso model approach. Most regresso software packages wll also compute a table or lstg of the resduals from the model. The resduals from the Mtab regresso model ft obtaed prevousl are as follows: Obs Factor Level Bod tr Ft tdev Ft Resdual t Resd -. 6.85 6.764.899.86.3 -. 6.4 6.764.899 -.364 -.35 3 -. 7. 6.764.899.446.65 4 -. 6.35 6.764.899 -.44 -.54 5 -. 6.5 6.764.899 -.44 -.9 6 -. 7.4 6.764.899.76. 7 -. 6.96 6.764.899.96.73 8 -. 7.5 6.764.899.386.43 9 -. 6.59 6.764.899 -.74 -.65 -. 6.57 6.764.899 -.94 -.7. 7.5 7.9.899 -.4 -.56. 7.63 7.9.899 -.9 -.8 3. 8.5 7.9.899.38. 4. 8. 7.9.899.78.9 5. 7.86 7.9.899 -.6 -.3 6. 7.75 7.9.899 -.7 -.64 7. 8. 7.9.899.98. 8. 7.9 7.9.899 -. -.8 9. 7.96 7.9.899.38.4. 8.5 7.9.899.8.85 The colum labeled Ft cotas the averages of the two samples, computed to four decmal places. The resduals the sxth colum of ths table are the same (apart from roudg) as we computed mauall.

-6. ome More Iformato About the Pared t-test The pared t-test exames the dfferece betwee two varables ad test whether the mea of those dffereces dffers from zero. I the text we show that the mea of the dffereces µ d s detcal to the dfferece of the meas two depedet samples, µ µ. However the varace of the dffereces s ot the same as would be observed f there were two depedet samples. Let d be the sample average of the dffereces. The V( d) = V( ) = V( ) + V( ) Cov(, ) σ ( ρ) = assumg that both populatos have the same varace σ ad that ρ s the correlato betwee the two radom varables ad. The quatt d / estmates the varace of the average dfferece d. I ma pared expermets a strog postve correlato s expected to exst betwee ad because both factor levels have bee appled to the same expermetal ut. Whe there s postve correlato wth the pars, the deomator for the pared t-test wll be smaller tha the deomator for the two-sample or depedet t-test. If the two-sample test s appled correctl to pared samples, the procedure wll geerall uderstate the sgfcace of the data. Note also that whle for coveece we have assumed that both populatos have the same varace, the assumpto s reall uecessar. The pared t-test s vald whe the varaces of the two populatos are dfferet.