The experimental unit of a study is the object on which measurements are taken.

Size: px
Start display at page:

Download "The experimental unit of a study is the object on which measurements are taken."

Transcription

1 Contents 4 Analyss of Varance (ANOVA) Introducton Termnology Data One-Way ANOVA The Model Identfablty Model Assumptons Inference Checkng Model Assumptons Nonparametrc Test Contrasts Two-Way ANOVA Model Parameter Estmaton Hypothess Testng Randomzed Block Desgn Nonparametrc Test Analyss of Varance (ANOVA) 4.1 Introducton The goal s to extend our prevous results from two samples to more than two samples. In drect extenson of the two-sample case, we can magne that we collect samples from T 2 populatons. Another scenaro that fts n ths framework s the case where a sngle sample of sze n s randomly assgned to T treatments. In ths case, we magne T hypothetcal populatons. The tth populaton s hypothetcally subjected to treatment t. The samples randomly assgned to treatment t are lke a random sample from the tth populaton Termnology Defntons/Termnology Defnton: expermental unt The expermental unt of a study s the object on whch measurements are taken. Expermental unts can be people, computers, anmals, classes, unverstes, etc. Let s consder some examples. Rats are randomly fed dfferent ron supplements n ther food, and ron retenton n ther bodes s measured 3 days later. The expermental unt s the rat.

2 Rats lve n cages, and all rats n the same cage eat from a communal food bowl. Food bowls were randomly dosed wth dfferent ron supplements, and all rats measured for ron retenton 3 days later. The expermental unt s the cage, because rats wthn cages cannot be assgned to dfferent treatments. A scentst wants to determne whether chldhood exposure to TV commercals mpacts obesty. They enroll random famles n the study, recordng TV usage of the household perodcally over several years. Durng the study, they also record heght and weght of chldren n the famly. The expermental unt s the collecton of all chldren n the household, snce TV usage of ndvdual chldren cannot be determned. When testng chemcals for teratogeness (the propensty to cause brth defects), pregnant rats are subject to hgh doses and the pups are scored for brth defects. Because the ndvdual pups cannot be assgned dfferent treatments, the ltter s an expermental unt. A rabbt s sacrfced and the retnas are used for scentfc experments on neurons (horrble thng, but true). To avod waste, both retnas are used n separate, ndependent experments. The expermental unts are the retnas, but one should worry about the foced blockng on rabbt (see dscusson of blockng later). If wthn each retna, ndvdual amacrne cells are located and ndependently tested, then the expermental unt s the amacrne cell. There s blockng at two herarchcal levels: the rabbt, and the retna wthn the rabbt. Defnton: factor A factor s a varable that s controlled n an experment. Dstnct values of the factor are called levels. Some examples of factors are: (1) tme exposed to teratogenc chemcal, (2) dose of teratogenc chemcal, (3) temperature of greenhouse, (4) ntal number of nfectous agents n an agent-based model of an epdemc, etc. Defnton: treatment A treatment s a specfc combnaton of factors to whch expermental unts may be exposed n an experment. An example s the dose and tme of exposure to a teratogenc chemcal. ANOVA analyses can be classfed n several ways. A one-way ANOVA consders a set of treatments caused by varyng one factor. A two-way ANOVA consderes a set of treatments caused by varyng two factors smultaneously. For example, f one desgned a teratogenc study that vared the dose and exposure tme to the chemcal, then you are varyng two factors, dose and tme. One can expand to mult-way ANOVA by ncludng addtonal factors. If the populatons (or treatments) ncluded n the study are selected by the expermenter and nferences are to be made only about those populatons, then the model s called a fxed effects model. If nstead the populatons are representatve of a large collecton of populatons, many of whch are not sampled, and the expermenter wshes to nfer propertes of all populatons, then the model s called a random effects model. For example, f three antdepressant drugs and a control are appled to four random groups of patents n order to determne whch of these drugs can reduce depresson symptoms, then the mean treatment responses are fxed effects assocated wth these three treatments only. In contrast, f three, representatve antdepressant drugs are appled to three random groups of patents to determne the sde effects of takng antdepressants n general, then the observed means for the three drugs are random varables representatve of the knds of effects caused by any antdepressant drug, even those not ncluded n the study. We wll focus on fxed effects ANOVA n these notes Data The data assocated wth ANOVA mght be summarzed n a table of the followng form: 2

3 Treatment k y 11 y 21 y 31 y k1 y 12 y 22 y 32 y k2... y 2n2 y 3n3. y 1n1 y knk... Example: heghts of sngers n a chor Suppose, for example, that you are studyng the heghts of sngers n a chor. Your data table s below. [Fnd the orgnal data at the Data & Story Lbrary.] Soprano Alto Tenor Bass The treatments are actual populatons n ths case,.e. the dfferent snger types (soprano, alto, tenor, bass). In workng wth ths data, you mght be nterested n determnng whether the mean heghts of all treatments are the same. Specfcally, you mght expect a sgnfcant dfference n the mean heghts of basses and sopranos, because most, f not all, of the former are male, and the latter are female. Frst Step The frst step n any data analyss s to plot the data. Dr. Zhu ntroduced you to the R functons boxplot() and strpchart() partcularly useful n ths context. Statstcs We now ntroduce some notaton and common statstcs that are computed from ANOVA-type data. The sample treatment mean s Ȳ = 1 n Y j n j=1 3

4 The overall sample mean s obtaned as Ȳ = 1 k =1 n n =1 j=1 Y j Intutvely, t should be clear that large dfferences n the sample treatment means may ndcate that the treatments affect the quantty beng measured. Thus, t should come as no surprse, that varaton n sample means s exactly the sgnal used to reject the null hypothess of no treatment effects. The detals are below. 4.2 One-Way ANOVA The Model Cell Means Model Y j = θ + ɛ j = 1, 2,..., k, j = 1, 2,..., n where θ are unknown populaton parameters, ɛ j are random errors, k s the number of dstnct populatons, and n s the sample sze n the th populaton. Note, the sample szes may not be equal. Note, f we assume E[ɛ j ] = 0, then the expected value of data s E[Y j ] = θ, j = 1, 2,..., n. Thus, the mean of the data depends only on the treatment. In partcular, we conclude that the parameter, θ, s the populaton mean of populaton. Snce we focus on fxed effect models, ths collecton of k populatons are the only ones of nterest. Therefore, θ are vewed as unknown constants. Alternatve Parameterzaton Often, you wll see another parameterzaton of the one-way ANOVA model. Then, Y j = µ + α + ɛ j E[Y j ] = µ + α where µ s the grand mean and α s the unque effect of treatment. Notce the expected dfference between two measurements s gven as E[Y j Y kl ] = µ + α (µ + α k ) = α α k, a dfference n effects. Note, there are k + 1 parameters n ths model formulaton and ths leads to dentfablty problems, dscussed n the next secton Identfablty Recall our overall framework. We have a populaton and assocated wth t are unknown populaton parameter(s) θ. We assume there s some probablty model that descrbes data X sampled from ths populaton. The probablty model defnes the pdf f θ (x) (or pmf for dscrete outcomes) for the data. Defnton: dentfable A populaton parameter θ s dentfable f dstnct θ correspond to dstnct pdfs (or pmfs for dscrete random varables). That s, f θ θ, then the pdf of the data f θ (x) f θ (x) are dstnct functons. 4

5 For example, f µ 1 µ 2, then the correspondng normal pdfs are not the same: f µ1 (x) = 1 [ exp (x µ ] 1) 1 [ exp (x µ ] 1) = f µ2 (x) 2πσ 2σ 2πσ 2σ ndcatng that populaton mean of normally dstrbuted random varables s dentfable. 1. dentfablty s a property of the model (not the estmates of the populaton parameter), so solvng dentfablty problems nvolves changng the model 2. f a model s not dentfable, then estmaton of or nference on ts populaton parameters s not possble Alternatve Parameterzaton s Overparameterzed In the alternatve formulaton, there are k + 1 parameters and k sample means avalable from the data. The extra degree of freedom n the data ndcates that the model s undentfable. More than one choce of (µ, α 1,..., α k ) can lead to the same pdf. One restrcton on the parameters must be added to make the model dentfable. There are multple choces for that restrcton that change the way the parameters are nterpretted. α = 0 means that we can nterpret the α as devatons from the overall mean attrbutable to each populaton. α 1 = 0 mght be useful f populaton 1 s the control group and we want to nterpret the α, > 1 as devatons from no treatment Model Assumptons Assumptons 1. E[ɛ j ] = 0, Var(ɛ j ) = σ 2 < for all, j, Cov(ɛ j, ɛ kl ) = 0 for all, j, k, l wth j or k l. 2. ɛ j N(0, σ 2 ) ndependent. 3. Homoscedastcty: σ 2 = σ2 Comments: Assumpton 2 s requred for hypothess testng and confdence ntervals. Wthout assumpton 2, we are lmted to do estmaton. Wth assumpton 1 about varance, we can fnd the estmate wth mnmum varance. Non-normalty can lead to dffcultes, but there are solutons for other knds of dstrbutons. We wll not dscuss much here. We can use CLT to get normalty on populaton means f n s large enough and the real dstrbuton s farly symmetrc. Robustness to volatons of 3 s lessened f n n constant for all treatments. Robustness to volatons of 2 depends on the extent to whch 3 s true. For ths reason, people wll often transform the Y random varables to acheve 3 so that they do not need to worry so much about normalty of ther data. 5

6 4.2.4 Inference Estmatng µ and α Frst, we address the problem of estmatng the parameters µ and α (we address estmaton of σ 2 n the context of hypothess testng for ANOVA). It should be ntutvely clear that and ˆµ = Ȳ ˆα = Ȳ ˆµ. are good estmators. You can also show these are the maxmum lkelhood estmates of the parameters gven the model assumptons defned n the last secton. We wll dscuss estmaton of σ 2 when we dscuss nference. Classc ANOVA Hypothess H 0 : θ 1 = θ 2 = = θ k or H 0 : α 1 = = α k = 0 H A : θ θ j for some j or H A : α 0 for some Ths hypothess s not often so nterestng. Take the example of comparng several treatments. One may often nclude a control as a treatment to make sure that the experment runs as planned. One knows before even collectng data that the control should have a dfferent outcome compared to the rest, whch means ths classc H 0 wll always be rejected. We mght stll lke to know f θ 2 θ 3. We wll come back to ths problem later. Parttonng Varance Often, ANOVA s presented as a way of parttonng the varance. The total varablty can be summarzed as the total sum of squares n SS tot = (Y j Ȳ ) 2 =1 j=1 Note, ths s just N 1 tmes the combned sample varance, where N = k =1 n. By addng and subtractng the sample means Ȳ, we can partton the total varance nto parts n [ ( Ȳ Ȳ ) 2 + (Y j Ȳ ) 2] =1 j=1 Expand the quadratc and recognze the cross-term becomes 0 because to fnd n j=1 n (Y j Ȳ ) 2 = =1 j=1 Interprettng each part of ths sum, we have (Y j Ȳ ) = n Ȳ n Ȳ = 0 n (Ȳ Ȳ ) 2 + =1 SS tot = SS B + SS W n (Y j Ȳ ) 2 =1 j=1 where SS B s the sum-of-squares due to varaton between treatments and SS W s the sum-of-squares due to error wthn treatments. There are N observatons, so there are N 1 d.f. for SS tot. There are k treatments, so there are k 1 d.f. for SS B. Wthn the th treatment, there are n 1 d.f. for a total of (n 1) = N k d.f. wthn treatments for the SS W. 6

7 Estmatng σ 2 We wll now show how all sums-of-squares, SS tot, SS B, and SS W, are estmates of populaton varance σ 2 under the ANOVA null hypothess. Ths fact wll also allows us to propose statstcs wth samplng dstrbutons for testng the null hypothess. Lemma 13. Suppose X ndependent are random varables wth E[X ] = µ and Var(X ) = σ 2, = 1,..., n. Then, E [ (X X) 2] = (µ µ) 2 + n 1 n σ2, where µ = 1 n n =1 µ. Proof. Recall for any random varable Z, that E[Z 2 ] = (E[Z]) 2 + Var(Z) by defnton of varance, and E [ X X ] = µ X by lnearty of expectaton. The only part mssng s Var(X X) = Var(X ) + Var( X) 2Cov(X, X) = σ 2 + σ2 n 2Cov X, 1 X j n = σ 2 + σ2 n 2σ2 n Puttng the parts back nto the formula for E[Z 2 ], the lemma s proved. Theorem 14. Gven the ANOVA assumptons, and assumng n = n for all, Proof. E[SS W ] = k(n 1)σ 2 E[SS B ] = n E [SS W ] = k =1 E [ n j=1 (Y j Ȳ ) 2 ] α 2 + (k 1)σ 2 =1 = k =1 E[(n 1)S2 ] defnton of sample varance = k =1 (n 1)σ2 constant varance assumpton & sample varance unbased = k(n 1)σ 2. The second result uses the lemma. E[SS B ] = n = n = n =1 =1 E [ (Ȳ Ȳ ) 2] [ α 2 + k 1 ] kn σ2 α 2 + (k 1)σ 2 =1 where the second step results because E[Ȳ ] = E[Y j ] = µ + α, Var(Ȳ ) = σ2 n, µ = 1 k k =1 (µ + α ) = µ, and µ µ = α. The expectatons just derved suggest two ways to estmate the populaton varance σ 2. Most naturally, we defne the pooled sample varance Sp 2 := SS W k(n 1). 7 j

8 Under the ANOVA assumpton of constant varance, S 2 p uses all the data to estmate populaton varance σ 2. (The denomnator becomes N k for unequal sample szes.) Ths s the mult-sample extenson of the pooled sample varance from the two-sample t-test wth equal varances. When the ANOVA hypothess s true, so α = 0 for all, then SS B k 1 also estmates the populaton varance. By the way, so does SS tot kn 1. It s the dfference between E[SS B [] and] E[SS W [] that forms ] the bass of a statstcal test of the null hypothess. If α 0 for some, then E SSB k 1 > E SSW k(n 1). It s ths sgnal we ll capture n a statstc. The only trck then, wll be to choose the statstc and determne ts dstrbuton under the null. To help us toward that goal, we consder the followng theorem. Theorem 15. If ɛ j d N(0, σ 2 ), then and f α = 0 for all, then ndependent of SS W σ 2. SS W σ 2 SS B σ 2 χ 2 k(n 1) χ 2 k 1 Proof. We wll not prove ths result, but the proof follows the same knd of reasonng that gave us the chsquared dstrbuton n the one- or two-sample case. Ths theorem s a specal case of Cochran s Theorem, appled by recognzng that SS tot σ 2 χ 2 kn 1 when H 0 s true by results from the sngle sample result. (To be clear, under H 0, the multple samples are all part of one bg sngle sample from the same populaton.) We state Cochran s theorem for ts general applcablty to parttoned sums-of-squares. Theorem 16 (Cochran s Theorem). Let Z N(0, 1) for = 1,..., ν and ν Z 2 = Q 1 + Q Q s =1 wth s ν. Then, Q 1, Q 2,..., Q s are ndependent χ 2 random varables wth ν 1, ν 2,..., ν s d.f., respectvely f and only f ν = ν 1 + ν ν s. F Test for Testng Classcal ANOVA Hypothess These estmates of σ 2 provde the bass of the F test for the classc hypothess. Defne statstc F = SS B k 1 SS W k(n 1) As already argued, ths statstc should be close to 1 f H 0 s correct. Otherwse, t should tend to exceed 1. We take ths moment to defne a new dstrbuton called the F dstrbuton. Defnton: F dstrbuton 8

9 Gven U χ 2 m and V χ 2 n ndependent ch-square random varables, then W = U M V n F (m, n) s sad to have an F dstrbuton wth m and n degrees of freedom. The pdf of the F dstrbuton s gven by f(w) = Γ ( ) m+n ( 2 m ) ( m/2 Γ ( ) ( m 2 Γ n ) w m/ mw ) (m+n)/2 n n 2 Thus, f H 0 s correct, then theorem 15 shows us F F (k 1, k(n 1)), and n the case wth varable samples szes F (k 1, N k). If the alternatve hypothess s correct, then we expect SS B /(k 1) to overestmate the populaton varance, so large values statstc F wll ndcate problems wth H 0, thus rejecton s accordng to a one-taled test when F > F k 1,N k (1 α/2) In R, the functons pf(), qf(), and frends are for the F dstrbuton. The ANOVA Table The one-way ANOVA analyss s summarzed n the ANOVA table. Source of Varaton Sum of Squares Degrees of Freedom Mean Square F Between treatments SS B k 1 MS B = SS B k 1 Wthn treatments SS W N k MS W = SS W N k Total SS tot N 1 F = MS B MS W Tukey Method If the ANOVA null hypothess s rejected, then there s some effect α 0 for some populaton. It becomes mportant to fgure out whch effects are non-zero, or whch populaton means dffer sgnfcantly. Recallng θ = µ + α, our model assumptons yeld ) Ȳ N (θ, σ2 so we can construct CI usng t statstcs f we can produce an estmate of σ 2. Prevously, we argued that S 2 p always estmates σ 2, even when the null hypothess s not satsfed, so t s natural to form CI for θ of n Ȳ ± t N k (1 α/2)s p The degrees of freedom used are the degrees of freedom of SS W, whch s used to compute S 2 p. A way to remember the degrees of freedom s to realze there are N observatons, but k d.f. are lost to estmate the sample means n order to compute the sum of squared devatons n SS W. The above CI s drectly relevant to testng H 0 : α = 0. You can fgure out the test statstc and ts samplng dstrbuton. Notce, there are k such tests we mght need to run. As for computng CI for mean dfferences, e.g. θ θ j, we recognze that testng H 0 : α = α j s equvalent. For ths case, statstc Ȳ Ȳj S p 1/n + 1/n j t N k s useful, but there are ( k 2) pars of means we could test. In both cases above, t s unwse to run that many tests wthout correctng the type I error rate α. The objectve of Tukey s method s to estmate CI for all parwse mean dfferences µ µ j that smultaneously have the desred coverage. 9

10 Recall (Ȳ θ ) N(0, σ 2 /n) f σ 2 s constant across sample szes and the sample sze s constant n. Defne statstc SR = max,j (Ȳ θ ) (Ȳj θ j ) S p / n Under the ANOVA model, SR follows a studentzed range dstrbuton wth parameters k and k(n 1). Unusually large values of SR suggest that the proposed populaton means θ are not the true populaton means. We wll not wrte a formula for the studentzed range dstrbuton, but suppose q k,k(n 1) (1 α) s ts quantle. Then, [ P (Ȳ θ ) (Ȳj θ j ) q k,k(n 1) (1 α) S ] p = 1 α n for all j. When we hypothesze θ = θ j = µ, the confdence nterval for α α j s Ȳ Ȳj ± q k,k(n 1) (1 α) S p n If the CI does not contan 0, then we reject H 0 : α = α j wth p-value < α. The key advantage of the Tukey method s that f H 0 : α = α l s also rejected, then the p-value for that concluson s also < α. If separate t tests at the α level are used for these analyses, the CI s would be narrower, more nulls would be rejected, and the probablty of a type I error for any test would exceed α. Another soluton to ths problem s to use Bonferron corrected α values on the separate t tests. Example: We consder the followng data wth sample means computed from 7 treatments, each based on 10 measurements. Suppose we are gven the pooled sample varance s S p = 0.061, whch we could read off an ANOVA table as the square root of MS W. Lab Mean The quantle for the Tukey statstc s gven n R as qtukey(0.95, nmeans=7, df=63), so q 7,63 (0.95)S p / 10 = We can examne all absolute parwse dfferences, and any dfference that exceeds allows us to reject the correspondng null that there s no dfference wth α = We fnd populatons 1 and 4, 1 and 5, 1 and 6, and 3 and 4 have sgnfcantly dfferent treatment effects. If we, ncorrectly, performed a two-sample t-test, a sgnfcant dfference s anythng larger than t 63 (0.975)S p 2 10 = If we perform the two-sample t-tests wth Bonferron correcton, a sgnfcant dfference s found for every parwse dstance exceedng ( t ) 2 S p = You mght also be nterested to see the TukeyHSD() functon n R. 10

11 4.2.5 Checkng Model Assumptons Defnton: resdual The resdual s the dfference between the observaton and ts model-estmated mean. In ths case, r j = Y j ˆµ ˆα = Y j Ȳ By assumpton of the model, the resduals should be normally dstrbuted. One can check ths assumpton wth probablty plots for r j or other tests of normalty that we have dscussed. The ANOVA model addtonally assumes constant varance, and we have not yet dscussed methods for checkng varance, though some problems can be dentfed from the boxplots. Testng Common Varance The F test tself suggests a way to test equal varance n two-sample tests. If we have two ndependent samples: X 1,..., X nx Y 1,..., Y ny d N(µ x, σ 2 x) d N(µ y, σ 2 y) then we know are ndependent, so statstc (n x 1)Sx 2 χ 2 n x 1 σ 2 (n y 1)S 2 y σ 2 S 2 x S 2 y χ 2 n y 1 F (n x 1, n y 1) In ths case, the statstc may be unusually small or unusually large, but should be around 1 f the hypothess H 0 : σ 2 x = σ 2 x = σ 2 s correct. Thus, a two-taled test can be used to fnd samples wth sgnfcantly dfferent varances. Testng Common Varance: Multple Samples To test the null hypothess that all sample varances are equal across more than two samples,.e. H 0 : σ 2 1 = σ 2 2 = = σ 2 k = σ 2 we can use Bartlett s test or Levene s test. We wll not spend tme dervng these tests, but only show you how to use them. Bartlett s Test. See bartlett.test() n R. The downsde of ths test s t reles on the normalty assumpton. Levene s Test. Perform a second ANOVA on the absolute resduals, r j, testng the classc ANOVA hypothess of constant means (or no effects). 11

12 4.2.6 Nonparametrc Test Kruskal-Walls Test If you fnd that your data does not satsfy the ANOVA assumptons, there s an alternatve test that s related to the rank sum test. Let R j be the rank of Y j n the combned sample. Handle tes as for the rank sum test. Defne R = 1 n R j and R = 1 n R j = N + 1 n N 2 and j=1 SS B = =1 j=1 n ( R R ) 2. Then, t should be clear that the larger SS B, the more evdence there s aganst the hypothess =1 H 0 : same probablty dstrbuton for all k groups. As for the rank sum test, the statstc s most senstve to changes n locaton of the dstrbutons. For small samples, you can use R s functon kruskal.test() to compute the p-value. For larger samples, t turns out that K = 12SS B N(N + 1) χ2 k 1 has an asymptotc ch-square dstrbuton. The condtons for good asymptotcs are I = 3, n 5 or I > 3 and n Contrasts Contrast The followng s optonal materal. It was not dscussed n class, but t covers a very common aspect of ANOVA. Defnton: contrast Let t = (t 1,..., t k ) be a vector of random varables, ther realzatons, parameters, or statstcs. Let a = (a 1,..., a k ) be constants, then a t =1 s a lnear combnaton of t s. If a = 0, then the lnear combnaton s called a contrast. We can wrte the classcal ANOVA hypothess n terms of contrasts. Theorem 17. θ 1 = = θ k f and only f a θ = 0 for all a A, where A = {a = (a 1,..., a k ) : a = 0}. Proof. The forward mplcaton s obvous a θ = θ a = 0 The reverse mplcaton s also qute easy. Consder a (1) = (1, 1, 0,..., 0) A. Ths one shows θ 1 = θ 2. Smlarly, a (2) = (0, 1, 1, 0,..., 0) shows θ 2 = θ 3. In general, the set a (1), a (2),..., a (k 1) spans the space A. Therefore, all possble equaltes encoded n θ 1 = = θ k are mpled by combnng these vectors approprately. 12

13 Inference on Contrasts Under the ANOVA assumptons, we have Y j N(θ, σ 2 ) and Also, for any a, wth mean and varance [ E a Ȳ ] = a θ Ȳ N(θ, σ 2 /n ). a Ȳ N(, ) =1 ( Var a Ȳ ) = σ 2 a 2 n =1 t-test for Generc Contrast But of course, we don t usually know σ 2. Instead, we use S 2 = 1 n 1 n =1 ( Yj Ȳ ) 2 whch s unbased for σ 2 (σ 2 wth heteroscedastcty) and also has dstrbuton (n 1)S 2 σ 2 χ 2 n 1 If assumpton 3 of homoscedastcty apples, then we can pool sample varances to get a better estmate of σ 2. Namely, wth N = n, we use the pooled sample varance S 2 p = 1 N k (n 1)S 2 = 1 n ( Yj Ȳ ) 2 N k =1 =1 j=1 Because the S 2 are ndependent, we also have (N k)s 2 p σ 2 χ 2 N k Also, because S 2 p s ndependent of Ȳ, we have that statstc a Ȳ a θ S p a 2 n whch allows confdence ntervals of the usual form t N k a Ȳ t N k,α/2 S p a 2 n a θ a Ȳ + t N k,α/2 S p a 2 n 13

14 4.3 Two-Way ANOVA Model Two-Way ANOVA Model In the two-way ANOVA, the expermenter smultaneously controls two factors, e.g. dosage level and exposure tme. Each combnaton of factors s a treatment, and forms a cell n the two-way layout. Suppose there we observe a constant K observatons per cell, I levels of factor one, and J levels of factor two. Then the two-way ANOVA model s Y jk = µ + α + β j + δ j + ɛ jk ɛ jk d N(0, σ 2 ) α effect of factor one, level, β j effect of factor two, level j, I α = 0 =1 J β = 0 δ j nteracton effect of factor one, level, and factor two, level j, j=1 I J δ j = δ j = 0 =1 j= Parameter Estmaton As we motvated the estmates for the one-way ANOVA, we can use to justfy µ jk = E[Y jk ] = µ + α + β j + δ j µ k = 1 E[Y jk ] = 1 (µ + α + β j + δ j ) = µ + α J J j j µ jk = 1 E[Y jk ] = 1 (µ + α + β j + δ j ) = µ + β j I I ˆµ = Ȳ ˆα = Ȳ Ȳ ˆβ = Ȳ j Ȳ ˆδ j = Ȳj (ˆµ + ˆα + ˆβ j ) = Ȳj Ȳ (Ȳ Ȳ ) (Ȳ j Ȳ ) = Ȳj Ȳ Ȳ j + Ȳ but t s agan possble to show that these are maxmum lkelhood estmates under the assumpton of normally dstrbuted errors and fxed effects. The lkelhood s [ L(Y jk ; µ, α, β j, δ j, σ 2 1 ) = exp 1 ] 2πσ 2 2σ 2 (Y jk µ α β j δ j ) 2. Because of ndependence of observatons, we have the log lkelhood of all the data Y s l(y ; µ, α, β j, δ j, σ 2 ) = IJK 2 ln(2πσ 2 ) 1 2σ 2 j (Y jk µ α β j δ j ) 2 Maxmzng smultaneously for all the parameters, yelds the ntutve estmates above. Agan, we leave estmaton of σ 2 untl later, as t s ntmately related to hypothess testng. k 14

15 4.3.3 Hypothess Testng Sums-of-Squares As we dd for one-way ANOVA, we can break down the total sums-of-squares nto components. SS tot = SS A + SS B + SS AB + SS E where SS A measures the varaton n the factor one means, SS B measures the varaton n the factor two means, SS AB quantfes the strength of the nteracton effects, and SS E s analagous to the wthn sum-ofsquares, measurng the measurement error (hence subscrpt E). In terms of the data, ths partton of the sum-of-squares s jk Ȳ ),j,k(y 2 = JK (Ȳ Ȳ ) 2 +IK (Ȳ j Ȳ ) 2 +K (Ȳj Ȳ Ȳ j +Ȳ ) 2 + j,j,j,k whch can be proven by expandng Y jk Ȳ = (Y jk Ȳj ) + (Ȳ Ȳ ) + (Ȳ j Ȳ ) + (Ȳj Ȳ Ȳ j + Ȳ ), squarng both sdes, and droppng cross-terms (because they sum to 0). As before, we frst work out the expectatons of each of these parttoned sums-of-squares. Theorem 18. Under the two-way ANOVA model wth ɛ jk d N(0, σ 2 ), (Y jk Ȳj ) 2 E[SS A ] = (I 1)σ 2 + JK α 2 E[SS B ] = (J 1)σ 2 + IK j β 2 j E[SS AB ] = (I 1)(J 1)σ 2 + K,j δ 2 j E[SS E ] = IJ(K 1)σ 2 Proof. You can use lemma 13 to prove the result for SS A and SS B. For SS AB, we apply the lemma to E[SS tot ] = E,j,k(Y jk Ȳ ) 2 =,j,k [ ] IJK 1 IJK σ2 + (α + β j + δ j ) 2 = (IJK 1)σ 2 + JK α 2 + IK j β 2 j + K,j δ 2 j whch uses all the denttes lke α = 0 to smplfy the result. Then, because E[SS AB ] = E[SS tot ] E[SS A ] E[SS B ], we are done. Next comes dstrbutonal nformaton for the sums-of-squares. Theorem 19. Under the two-way ANOVA model wth ɛ jk d N(0, σ 2 ), SS E /σ 2 χ 2 IJ(K 1). If H A : α = 0, then SS A /σ 2 χ 2 I 1. If H B : β j = 0 j, then SS B /σ 2 χ 2 J 1. If H AB : δ j = 0, j, then SS AB /σ 2 χ 2 (I 1)(J 1). And SS A, SS B, SS AB, and SS E are all ndependent of each other. 15

16 Estmatng σ 2 We can see that SS E can be used to estmate populaton varance σ 2, so ˆσ 2 = SS E IJ(K 1) := S2 p Under approprate null hypotheses, the other sums-of-squares also estmate σ 2. Testng Hypotheses As before, ths realzaton motvates the F tests. Theorem 20. Under the two-way ANOVA model wth ɛ jk d N(0, σ 2 ), For testng H A, F = SS A /SS E F (I 1, IJ(K 1)). For testng H B, F = SS B /SS E F (J 1, IJ(K 1)). For testng H AB, F = SS AB /SS E F ((I 1)(J 1), IJ(K 1)). ANOVA Table Source of Varaton Sum of Squares Degrees of Freedom Mean Square F Factor one SS A I 1 MS A = SS A I 1 Factor two SS B J 1 MS B = SS B J 1 Interacton SS AB (I 1)(J 1) MS AB = SS AB (I 1)(J 1) Error SS E IJ(K 1) MS E = SS E IJ(K 1) Total SS tot IJK 1 F = MS A MS E F = MS B MS E F = MS AB MS E Reduced Model: No Interacton Notce, f nteracton effects are assumed δ j = 0, then the ANOVA table reduces and the degrees of freedom changes. Source of Varaton Sum of Squares Degrees of Freedom Mean Square F Factor one SS A I 1 MS A = SS A I 1 Factor two SS B J 1 MS B = SS B Error SS E IJK I J + 1 MS E = Total SS tot IJK 1 J 1 SS E IJK I J+1 F = MS A MS E F = MS B MS E Confdence Intervals Tukey s method can be extended to the two-way ANOVA, but we wll focus on uncorrected CI here. Suppose we want a CI for α α, then the relevant statstc s Because the samples are ndependent, we have Ȳ Ȳ. and the CI are Var(Ȳ ) = Var(Ȳ ) = σ2 JK and Var(Ȳ Ȳ ) = 2σ2 JK. 2SS E Ȳ Ȳ ± t IJ(K 1) (1 α/2) IJ 2 K(K 1) 16

17 4.3.4 Randomzed Block Desgn Expermental Desgn We now take a moment to dscuss expermental desgn because one of the most common expermental desgns produces a two-way ANOVA. Defnton: completely randomzed desgn (CRD) Gven T treatments and n expermental unts, the completely randomzed desgn results f the EU are randomly dvded nto T groups wth n 1,..., n T EU n each, such that all EU n group t receve treatment t. As we have dscussed, the randomzaton of the CRD s a good thng because t nsures that there are no confoundng factors ntroduced by expermenter n assgnng treatments that mght also affect the response. Defnton: randomzed block desgn (RBD) The RBD conssts of B blocks of T EU each, wth treatments randomly assgned such that each treatment appears exactly once n each block. The RBD s an extenson of the matched par desgn. If the expermenter can dentfy a confoundng factor, e.g. weght of subject, computer lab contanng computer, etc., that mght affect the measured response, then t s a good dea to use a RBD desgn to block ( par n the context of > 2 samples) on the confoundng factor. Example: Suppose four treatments are to be appled to 8 expermental unts. In a CRD, we would probably randomly choose n 1 = = n 4 = 2 EU per treatment. The problem s that EU wll often vary tremendously n ther response to even the same treatment. Thus, n = 2 may not be enough EU to see small treatment effects amongst large subject effects. Suppose the treatments can be appled sequentally to the same EU (.e. there s nothng rreversble to the treatments, for example no surgeres). Then, a lot of power can be ganed by blockng on EU. Each EU s subject to all four treatments, appled n random order (here s where the randomzaton enters). A random RBD for four EU s shown below, where T s treatment. Subject T 2 T 4 T 1 T 1 T 1 T 2 T 3 T 4 T 4 T 1 T 2 T 3 T 3 T 3 T 4 T 2 The treatment order may have an effect as well. One can also block on tmng of treatment, so that each temporal sequence of the treatments s observed only once. Effcent desgns for ths mult-dmensonal blockng are Latn Hypercube Desgns. An example s shown below. Notce the treatment orders are no longer random, but do vary from subject to subject. Subject T 1 T 4 T 3 T 2 T 2 T 1 T 4 T 3 T 3 T 2 T 1 T 4 T 4 T 3 T 2 T 1 17

18 RBD as Two-Way ANOVA The RBD s very popular and t leads to the followng model Y j = µ + α + β j + ɛ j where α s the treatment effect, β j s the block effect (of lttle nterest), and ɛ j are the usual errors. Because K = 1, we drop the subscrpt k. I should note that RBD desgns often are mxed effects models, that s where α are fxed effects and β j are random effects. Take our example. We probably don t want to just make nference for the blocks (subjects) n our study, but to extrapolate to the populaton of EU. In ths case, β j are random effects. Fortunately, the hypothess testng for α are unchanged from the fxed effects models we ve been dscussng Nonparametrc Test Fredman s Test The one-way ANOVA assumptons about the errors also apply to the two-way ANOVA. If these assumptons are suspect for your dataset, then nonparametrc methods may be warranted. Fredman s test for the RBD s a generalzaton of the sgn rank test for pared samples. For each treatment, rank the measurements Y 1,..., Y B to obtan R 1,..., R B. Then compute SS A = J ( R R ) 2, a measure of the dfference n ranks across treatments. R s functon fredman.test() can be used to compute p-values usng ths statstc, but for large samples, the statstc Q = 12SS A I(I + 1) χ2 I 1. 18

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout Serk Sagtov, Chalmers and GU, February 0, 018 Chapter 1. Analyss of varance Chapter 11: I = samples ndependent samples pared samples Chapter 1: I 3 samples of equal sze one-way layout two-way layout 1

More information

Topic- 11 The Analysis of Variance

Topic- 11 The Analysis of Variance Topc- 11 The Analyss of Varance Expermental Desgn The samplng plan or expermental desgn determnes the way that a sample s selected. In an observatonal study, the expermenter observes data that already

More information

Topic 23 - Randomized Complete Block Designs (RCBD)

Topic 23 - Randomized Complete Block Designs (RCBD) Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,

More information

ANOVA. The Observations y ij

ANOVA. The Observations y ij ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

F statistic = s2 1 s 2 ( F for Fisher )

F statistic = s2 1 s 2 ( F for Fisher ) Stat 4 ANOVA Analyss of Varance /6/04 Comparng Two varances: F dstrbuton Typcal Data Sets One way analyss of varance : example Notaton for one way ANOVA Comparng Two varances: F dstrbuton We saw that the

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov, UCLA STAT 3 ntroducton to Statstcal Methods for the Lfe and Health Scences nstructor: vo Dnov, Asst. Prof. of Statstcs and Neurology Chapter Analyss of Varance - ANOVA Teachng Assstants: Fred Phoa, Anwer

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

STAT 511 FINAL EXAM NAME Spring 2001

STAT 511 FINAL EXAM NAME Spring 2001 STAT 5 FINAL EXAM NAME Sprng Instructons: Ths s a closed book exam. No notes or books are allowed. ou may use a calculator but you are not allowed to store notes or formulas n the calculator. Please wrte

More information

17 Nested and Higher Order Designs

17 Nested and Higher Order Designs 54 17 Nested and Hgher Order Desgns 17.1 Two-Way Analyss of Varance Consder an experment n whch the treatments are combnatons of two or more nfluences on the response. The ndvdual nfluences wll be called

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA Sngle classfcaton analyss of varance (ANOVA) When to use ANOVA ANOVA models and parttonng sums of squares ANOVA: hypothess testng ANOVA: assumptons A non-parametrc alternatve: Kruskal-Walls ANOVA Power

More information

Lecture 6 More on Complete Randomized Block Design (RBD)

Lecture 6 More on Complete Randomized Block Design (RBD) Lecture 6 More on Complete Randomzed Block Desgn (RBD) Multple test Multple test The multple comparsons or multple testng problem occurs when one consders a set of statstcal nferences smultaneously. For

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

experimenteel en correlationeel onderzoek

experimenteel en correlationeel onderzoek expermenteel en correlatoneel onderzoek lecture 6: one-way analyss of varance Leary. Introducton to Behavoral Research Methods. pages 246 271 (chapters 10 and 11): conceptual statstcs Moore, McCabe, and

More information

Unit 8: Analysis of Variance (ANOVA) Chapter 5, Sec in the Text

Unit 8: Analysis of Variance (ANOVA) Chapter 5, Sec in the Text Unt 8: Analyss of Varance (ANOVA) Chapter 5, Sec. 13.1-13. n the Text Unt 8 Outlne Analyss of Varance (ANOVA) General format and ANOVA s F-test Assumptons for ANOVA F-test Contrast testng Other post-hoc

More information

MD. LUTFOR RAHMAN 1 AND KALIPADA SEN 2 Abstract

MD. LUTFOR RAHMAN 1 AND KALIPADA SEN 2 Abstract ISSN 058-71 Bangladesh J. Agrl. Res. 34(3) : 395-401, September 009 PROBLEMS OF USUAL EIGHTED ANALYSIS OF VARIANCE (ANOVA) IN RANDOMIZED BLOCK DESIGN (RBD) ITH MORE THAN ONE OBSERVATIONS PER CELL HEN ERROR

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats tatstcal Models Lecture nalyss of Varance wo-factor model Overall mean Man effect of factor at level Man effect of factor at level Y µ + α + β + γ + ε Eε f (, ( l, Cov( ε, ε ) lmr f (, nteracton effect

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

STATISTICS QUESTIONS. Step by Step Solutions.

STATISTICS QUESTIONS. Step by Step Solutions. STATISTICS QUESTIONS Step by Step Solutons www.mathcracker.com 9//016 Problem 1: A researcher s nterested n the effects of famly sze on delnquency for a group of offenders and examnes famles wth one to

More information

Chapter 12 Analysis of Covariance

Chapter 12 Analysis of Covariance Chapter Analyss of Covarance Any scentfc experment s performed to know somethng that s unknown about a group of treatments and to test certan hypothess about the correspondng treatment effect When varablty

More information

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε Chapter 3 Secton 3.1 Model Assumptons: Multple Regresson Model Predcton Equaton Std. Devaton of Error Correlaton Matrx Smple Lnear Regresson: 1.) Lnearty.) Constant Varance 3.) Independent Errors 4.) Normalty

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University PHYS 45 Sprng semester 7 Lecture : Dealng wth Expermental Uncertantes Ron Refenberger Brck anotechnology Center Purdue Unversty Lecture Introductory Comments Expermental errors (really expermental uncertantes)

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Statistics Chapter 4

Statistics Chapter 4 Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 14 Multple Regresson Models 1999 Prentce-Hall, Inc. Chap. 14-1 Chapter Topcs The Multple Regresson Model Contrbuton of Indvdual Independent Varables

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected. ANSWERS CHAPTER 9 THINK IT OVER thnk t over TIO 9.: χ 2 k = ( f e ) = 0 e Breakng the equaton down: the test statstc for the ch-squared dstrbuton s equal to the sum over all categores of the expected frequency

More information

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity ECON 48 / WH Hong Heteroskedastcty. Consequences of Heteroskedastcty for OLS Assumpton MLR. 5: Homoskedastcty var ( u x ) = σ Now we relax ths assumpton and allow that the error varance depends on the

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X). 11.4.1 Estmaton of Multple Regresson Coeffcents In multple lnear regresson, we essentally solve n equatons for the p unnown parameters. hus n must e equal to or greater than p and n practce n should e

More information

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables LINEAR REGRESSION ANALYSIS MODULE VIII Lecture - 7 Indcator Varables Dr. Shalabh Department of Maematcs and Statstcs Indan Insttute of Technology Kanpur Indcator varables versus quanttatve explanatory

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE II LECTURE - GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCE Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 3.

More information

Solutions Homework 4 March 5, 2018

Solutions Homework 4 March 5, 2018 1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and

More information

Introduction to Analysis of Variance (ANOVA) Part 1

Introduction to Analysis of Variance (ANOVA) Part 1 Introducton to Analss of Varance (ANOVA) Part 1 Sngle factor The logc of Analss of Varance Is the varance explaned b the model >> than the resdual varance In regresson models Varance explaned b regresson

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION 014-015 MTH35/MH3510 Regresson Analyss December 014 TIME ALLOWED: HOURS INSTRUCTIONS TO CANDIDATES 1. Ths examnaton paper contans FOUR (4) questons

More information

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero: 1 INFERENCE FOR CONTRASTS (Chapter 4 Recall: A contrast s a lnear combnaton of effects wth coeffcents summng to zero: " where " = 0. Specfc types of contrasts of nterest nclude: Dfferences n effects Dfferences

More information

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor Reduced sldes Introducton to Analss of Varance (ANOVA) Part 1 Sngle factor 1 The logc of Analss of Varance Is the varance explaned b the model >> than the resdual varance In regresson models Varance explaned

More information

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600 Statstcal tables are provded Two Hours UNIVERSITY OF MNCHESTER Medcal Statstcs Date: Wednesday 4 th June 008 Tme: 1400 to 1600 MT3807 Electronc calculators may be used provded that they conform to Unversty

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Chapter 14 Simple Linear Regression

Chapter 14 Simple Linear Regression Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng

More information