Matrix Representation of Data in Experiment

Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y X x3 1, 1 if 1 The first colum of the matrix X is all 1 s i.e. X 1 1 0 1 1 0 1 0 1 1 0 1 Sice the sum of the last two colums of X is the first colum of X, we have that rakx We seek a vector is miimized. 1 of estimates SSE S e y ij y ij i1 j1 ow where SSE y ij i i1 j1 y ij i We must therefore solve SSE 0, SSE 1 0, SSE 0 i.e. 3 ormal equatios (.E. s). 7

These become y ij 1...1 i1 j1 y 1 y 1j 1... j1 y y j...3 j1 otice the patter these ormal equatios follow; we will see similar patters throughout this course. We ca summarize these 3 ormal equatios usig oe matrix equatio: i.e. XY XX 1 OTE: XX 0 is sigular so we caot ivert it. (We ca use a pseudo-iverse but it is ot 0 uique). We are cocered with 1 1 3 gives us 1 y 1 y ow we ca write ad i1 y ij YY j1 YI x Y y YJ xy where J x is a square matrix of order with all etries beig 1 s. ow we ca write YY YA 1 YYA YYA 3 Y where A 1 J x A 1 J x J x J x J x 8

A 3 I x Jx 0 0 I x Jx ow ote that A i A i, i 1,,3, i.e. they are all idempotet Also A i A j 0,i j 1,, 3 their associated q.f. s are idepedetly distributed. So is tested usig H o : 1 YA Y YA 3 Y/ ~ F 1.d.f. uder H o these quadratic forms (q.f s.) i a ormal radom variable (r.v.),whe divided by, are distributed as ChiSquared ( ) with degrees of freedom (d.f.) equal to the rak of the matrix associated with the q.f. if idepedet, their ratios follow a F distributio. 9

Liear Models ad Quadratic Forms Mai Results: YY YA 1 YYA YYA 3 Y... 1. a. YA i Y, with proper divisors, are distributed as with d.f. rak(a i b. the quadratic forms YA i Y are idepedet c. the correspodig ratios have F distributios Liear Models: express Y as a liear combiatio of j s i.e. Here: Y is a vector of observatios, y i 1 x i1 x i... p x ip i for i 1,..., i.e. Y X X xp is a matrix of kow costats (X is called the desig matrix), is a vector of parameters ad is a vector of radom errors. I experimetal desig, X cosists of oly 1s ad 0 s. 1. ote:. There is o restrictio that the model be a liear fuctio of the x i 3. There is o restrictio that the x i be idepedet. ote: Study of the experimetal situatio must motivate the model. A liear model ivolves a model equatio with associated assumptios that state the ature of the radom compoet ad the restrictios the parameters must satisfy. 10

Distributio Assumptios: Assumptios re i : y i x ij j i j 1. E i 0iE 0. E i j 0i j ucorrelated 3. E i ihomogeeous variace 4. E I Hece EY X ote that at this poit we make o assumptio of a ormal distributio. If Z is a vector of r.v. s: the mea vector is EZ ad the variace-covariace matrix is so If A is a matrixaz exists ad covz V covz i, z j v ij EAZ AEZ covaz AV A 5. Ofte we assume ormality. We assume the errorshave a multivariate ormal distributo with E 0, ad cov I i.e. 0,I or equivaletly, Y X,I The assumptio of ormality is importat for testig ad estimatio but ot for least squares estimatio or for subdivisio of the sum of squares (SS). The method of maximum likelihood, uder the assumptio of ormality, gives the same estimators of estimable fuctios as the method of least squares. 11

Method of Least Squares Estimate j by choosig estimator j which miimizes the sum of squares of the residuals ( S e SSE Se y i y i where y i x ijj, j 1,..., p YY YY ee j Solve the set of p equatios obtaied by settig Se j I geeral, the j th equatio is x ij y i ie. the ormal equatios are writte i matrix otatio as i1 0. (These equatios are called the ormal equatios.) p s1 i1 XY XX x is x ij s XX is pxp symmetric ad rxx rx umber of liearly idepedet colums of X. A. The Full Rak Case (this is the usual case i multiple regressio but ot i experimetal desig) If the p colums of X are liearly idepedet, rx rxx p thereforexx 1 exists ad the ormal equatios have a uique set of solutios give by XX 1 XY the ad E E XX 1 XY XX 1 XEY XX 1 XX cov cov XX 1 XY XX 1 X I XX 1 X XX 1 XI X XX 1 XX 1 XI X XX 1 XX 1 ote: We should choose experimetal poits with this i mid - i.e. choose X so as to miimize the average variace of the s s 1

The sum of squares of error is give by S e YY YY YYYYYYYY where YX But YY YX YX XX 1 XY YXXX 1 X Y YY ad YY X X XX XX 1 XY XX XX 1 XY YX XX 1 XX XX 1 XY YX XX 1 XY YY or YY Thus S e YYYY Y IXXX 1 X Y or YY YYS e where YY represets the sum of squares due to regressio Thus we have that the total (ucorrected) sum of squares is give by YY Y XXX 1 X YY IXXX 1 X Y S r S e 13

ow ES r E Y XXX 1 X Y XX p ES e E Y IXXX 1 X Y p Sice XX is positive defiite,xx is oegative ad equals zero oly if 0. Uder the assumptio of ormality, S r S e ocetral pd.f cetral pd.f ad the two q.f. s are idepedetly distributed Therefore to test H 0 : 0 we have test statistic F S e S r /p / p ocetral F p,pd.f This result uses the followig regardig quadratic forms i ormal radom variables: If Y, I the YAY md.f iff A is idempotet of rak m YAY ad YBY are idepedetly distributed iff AB 0 ote that we have S e Y IXXX 1 X Y YBY S r Y XXX 1 X Y YAY 14