PASS Sample Size Software

Size: px

Start display at page:

Download "PASS Sample Size Software"

Dortha Ryan
6 years ago
Views:

1 Chapter 57 Introducton Ths procedure power analyzes random effects desgns n whch the outcome (response) s contnuous. Thus, as wth the analyss of varance (ANOVA), the procedure s used to test hypotheses comparng varous group means. Unlke ANOVA, ths procedure relaxes the strct assumptons regardng the varances of the groups. Random effects models are commonly used to analyze longtudnal (repeated measures) data. Ths procedure extends many of the classcal statstcal technques to the case when the varances are not equal, such as Two-sample desgns (extendng the t-test) One-way layout desgns (extendng one-way ANOVA) Factoral desgns (extendng factoral GLM) Splt-plot desgns (extendng splt-plot GLM) Repeated-measures desgns (extendng repeated-measures GLM) Cross-over desgns (extendng GLM) Types of Lnear Several lnear mxed model subtypes exst that are characterzed by the random effects, fxed effects, and covarance structure they nvolve. These nclude fxed effects models, random effects models, and covarance pattern models. Fxed Effects Models A fxed effects model s a model where only fxed effects are ncluded n the model. An effect (or factor) s fxed f the levels n the study represent all levels of nterest of the factor, or at least all levels that are mportant for nference (e.g., treatment, dose, etc.). No random components are present. The general lnear model s a fxed effects model. Fxed effects models can nclude nteractons. The fxed effects can be estmated and tested usng the F-test. The fxed effects n the model nclude those factors for whch means, standard errors, and confdence ntervals wll be estmated and tests of hypotheses wll be performed. Other varables for whch the model s to be adjusted (that are not mportant for estmaton or hypothess testng) can also be ncluded n the model as fxed factors. Random Effects Models A random effects model ncludes both fxed and random terms n the model. An effect (or factor) s random f the levels of the factor represent a random subset of a larger group of levels (e.g., patents). The random effects are not tested, but are ncluded to make the model more realstc. Longtudnal Data Models Longtudnal data arses when more than one response s measured on each subject n the study. Responses are often measured over tme at fxed tme ponts. A tme pont s fxed f t s pre-specfed. Varous varance-matrx structures can be employed to model the varance and correlaton among repeated measurements. 57-

2 Types of Factors Between-Subject Factors Between-subject factors are those that separate the expermental subjects nto groups. If twelve subjects are randomly assgned to three treatment groups (four subjects per group), treatment s a between-subject factor. Wthn-Subject Factors Wthn-subject factors are those n whch the response s measured on the same subject at several tme ponts. Wthn-subject factors are those factors for whch multple levels of the factor are measured on the same subject. If each subject s measured at the low, medum, and hgh level of the treatment, treatment s a wthn-subject factor. Techncal Detals What s a Mxed Model? In a general lnear model (GLM), a random sample of the ndvduals n s drawn. Treatments are appled to each ndvdual and an outcome s measured. The data so obtaned are analyzed usng an analyss of varance table that ncludes an F-test. A mathematcal model may be formulated that underles each analyss of varance. Ths model expresses of the response varable as the sum of populaton parameters and a resdual. For example, a common lnear model for a two-factor experment s Y = µ + a + b + ( ab) + e jk where =,,..., I (the number of levels of factor ), j =,,..., J (the number of levels of factor ), and k =,,..., K (the number of subjects n the study). Ths model expresses the value of the response varable, Y, as the sum of fve components: µ the mean. a the contrbuton of the th level of a factor A. b j the contrbuton of the j th level of a factor B. (ab) j the combned contrbuton (or nteracton) of the th level of a factor A and the j th level of a factor B. e jk the contrbuton of the k th ndvdual. Ths s often called the resdual. In ths example, the lnear model s made up of fxed effects only. An effect s fxed f the levels n the study represent all levels of the factor that are of nterest, or at least all levels that are mportant for nference (e.g., treatment, dose, etc.). The followng assumptons are made when usng the F-test n a general lnear model.. The response varable s contnuous.. The ndvduals are ndependent.. The e jk follow the normal probablty dstrbuton wth mean equal to zero.. The varances of the e jk are equal for all values of, j, and k. j j jk 57-

3 The Lnear Mxed Model (LMM) The lnear mxed model (LMM) s a natural extenson of the general lnear model. Mxed models extend lnear models by allowng for the addton of random effects, where the levels of the factor represent a random subset of a larger group of all possble levels (e.g., tme of admnstraton, clnc, etc.). For example, the two-factor lnear model above could be augmented to nclude random effects such as an adjustment for each patent, snce a patent may be assumed to be a random realzaton from a dstrbuton of patents. The general form of the mxed model n matrx notaton s where y vector of responses X known desgn matrx of the fxed effects β y = Xβ + Zu + ε unknown vector of fxed effects parameters to be estmated Z known desgn matrx of the random effects u ε We assume where unknown vector of random effects unobserved vector of random errors u ~ N(,G) ε ~ N(,R) Cov[u, ε] = G varance-covarance matrx of u R varance-covarance matrx of the errors ε The varance-covarance matrx of y, denoted V, s V = Var[y] Indvdual Subject Formulaton = Var[Xβ + Zu + ε] = + Var[Zu + ε] = ZGZ' + R Because of the sze of the matrces that are nvolved n mxed model analyss, t s useful for computatonal purposes to reduce the dmensonalty of the problem by analyzng the data one subject at a tme. Because the data from dfferent subjects are statstcally ndependent, the log-lkelhood of the data can be summed over the subjects, accordng to the formulas below. Before we look at the lkelhood functons, we examne the lnear mxed model for a partcular subject: where y n vector of responses for subject. y = X β + Z u + ε, =,, N X n p desgn matrx of fxed effects for subject (p s the number of columns n X). β p vector of regresson parameters. 57-

4 Z n q desgn matrx of the random effects for subject. u q vector of random effects for subject whch has means of zero and covarance matrx G sub. ε n vector of errors for subject wth zero mean and covarance R. n number of repeated measurements on subject. N number of subjects. The followng defntons wll also be useful. e V vector of resduals for subject (e = y - X β). Var[y ] = Z G sub Z ' + R To see how the ndvdual subject mxed model formulaton relates to the general form, we have y y y = y N, X X X = X N, Z Z =, Z N u u u = u N, ε ε ε = ε N In order to test the parameters n β, whch s typcally the goal n LMM analyss, the unknown parameters (β, G, and R) must be estmated. Estmates for β requre estmates of G and R. In order to estmate G and R, the structure of G and R must be specfed. Detals of the specfc structures for G and R are dscussed later. The followng assumptons are made when usng the F-test n a LMM.. The response varable s contnuous.. The ndvduals are ndependent.. The responses follow the normal probablty dstrbuton wth mean equal to zero and varance structure gven by V. A dstnct (and arguably the most mportant) advantage of LMM over the GLM s flexblty n random error and random effect varance component modelng (note that the equal-varance assumpton of GLM s not necessary for LMM). LMM allows you to model both heterogeneous varances and correlatons among observatons through the specfcaton of the covarance matrx structures for u and ε. The varance matrx estmates are obtaned usng maxmum lkelhood (ML) or, more commonly, restrcted maxmum lkelhood (REML). The fxed effects n the mxed model are tested usng F-tests. Structure of the Varance-Covarance Matrx The G Matrx The G matrx s the varance-covarance matrx for the random effects u. Typcally, when the G matrx s used to specfy the varance-covarance structure of y, the structure for R s smply I. Cauton should be used when both G and R are specfed as complex structures, snce large numbers of sometmes redundant covarance elements can result. 57-

5 57-5 The G matrx s made up of N symmetrc G sub matrces, = sub sub sub sub G G G G G The dmenson of G sub s q q, where q s the number of random effects for each subject. Structures of G sub The structure of the G sub matrx n ths procedure s dagonal. Dagonal G sub = sub G The R Matrx The R matrx s the varance-covarance matrx for errors, ε. When the R matrx s used to specfy the varancecovarance structure of y, the G sub matrx s not used. The full R matrx s made up of N symmetrc R sub-matrces, = R N R R R R where N R R R R,,,, are all of the same structure. Structures of R There are many possble structures for the sub-matrces that make up the R matrx. The R Sub structures that can be specfed n PASS are shown below. Dagonal Homogeneous Heterogeneous Correlaton

6 57-6 Compound Symmetry Homogeneous Heterogeneous Correlaton AR() Homogeneous Heterogeneous Correlaton Toepltz Homogeneous Heterogeneous Correlaton

7 57-7 Toepltz() Homogeneous Heterogeneous Correlaton Banded() Homogeneous Heterogeneous Correlaton Note: Ths s the same as Toepltz(). Banded() Homogeneous Heterogeneous Correlaton Unstructured Homogeneous Heterogeneous Correlaton

8 Parttonng the Varance-Covarance Structure wth Groups In the case where t s expected that the varance-covarance parameters are dfferent across groups of a betweensubjects factor, a dfferent set of R or G parameters can be specfed for each group. Ths produces a set of varance-covarance parameters that s dfferent for each level of the chosen group varable, but each set has the same structure. Lkelhood Formulas There are two types of lkelhood estmaton methods that are generally consdered n mxed model estmaton: maxmum lkelhood (ML) and restrcted maxmum lkelhood (REML). REML s generally favored over ML because the varance estmates usng REML are unbased for small sample szes, whereas ML estmates are unbased only asymptotcally (see Lttell et al., 6 or Demdenko, ). Both estmaton methods are avalable n PASS. Maxmum Lkelhood The general form - log-lkelhood ML functon s The equvalent ndvdual subject form s L ( β, G, R) = ln V + e V e ln( π ) L + ML where N T s the total number of observatons, or ML N T N ( β, G, R) = ( ln V + e V e ) + NT ln( π ) = N N T = n = Restrcted Maxmum Lkelhood The general form - log-lkelhood REML functon s L REML The equvalent ndvdual subject form s ( β, G, R) = ln V + e V e + ln X V X + ( N p) ln( π ) N LREML ( β, G, R) = [ ln V + e V e ] + ln X V X + ( NT p) ln( π ) = = where, agan, N T s the total number of observatons, or N T and p s the number of columns n X or X. N N T = n = 57-8

9 Estmatng and Testng Fxed Effects Parameters The estmaton phase n the analyss of a mxed model produces varance and covarance parameter estmates of the elements of G and R, gvng Rˆ and Ĝ, and hence, Vˆ. The REML and ML solutons for βˆ are gven by wth estmated varance-covarance βˆ = Σˆ = var ( X Vˆ X) X Vˆ y ( ˆ ) ( ˆ β = X V X) See, for example, Brown and Prescott (6), Muller and Stewart (6), or Demdenko () for more detals of the estmatng equatons. Hypothess tests and confdence ntervals for β are formed usng a lnear combnaton matrx (or vector) L. Although you don t have to specfy L, t s mportant that you understand how ts functon. L Matrx Detals L matrces specfy lnear combnatons of β correspondng to means or hypothess tests of nterest. Essentally, the L matrx defnes the mean or test. The number of columns n each L matrx s the same as the number of elements of β. For estmatng a partcular mean, the L matrx conssts of a sngle row. For hypothess tests, the number of rows of L vares accordng to the test. Below are some examples of L matrces that arse n common analyses: L Matrx for Testng a Sngle Factor (Food wth levels) n a Sngle-Factor Model No. Effect Food L L L Intercept Food HghIron... Food LowIron -. Food None -. 5 Food Salcyl -. L Matrx for a Sngle Mean (LowIron) of a Sngle Factor ( levels) n a Sngle-Factor Model No. Effect Food L Intercept.. Food None 5 Food Salcyl Food HghIron Food LowIron L Matrx for Testng a Sngle Factor (Drug levels) n a Two-Factor Model wth Interacton No. Effect Drug Tme L L Intercept Drug Kerlosn.. Drug Laposec -. Drug Placebo -. 5 Tme.5 6 Tme 7 Tme.5 8 Tme 9 Tme

10 Tme Drug*Tme Kerlosn Drug*Tme Kerlosn Drug*Tme Kerlosn Drug*Tme Kerlosn Drug*Tme Kerlosn Drug*Tme Kerlosn Drug*Tme Laposec Drug*Tme Laposec Drug*Tme Laposec Drug*Tme Laposec Drug*Tme Laposec Drug*Tme Laposec Drug*Tme Placebo Drug*Tme Placebo Drug*Tme Placebo Drug*Tme Placebo Drug*Tme Placebo Drug*Tme Placebo Kenward and Roger Fxed Effects Hypothess Tests Hypothess tests have the general form H : Lβ = where L s a lnear contrast matrx of rank h correspondng to the desred comparsons to be made n the hypothess test. Let d be the denomnator degrees of freedom and q be the number of varance-covarance parameters, whch s the dmenson of W (defned below). The Kenward and Roger (997) test statstc for testng H s currently the most recommended method of specfyng the F-rato and ts degrees of freedom. The F-rato s where ( LC * L ) Lβ λ h, βˆ = L ˆ h F d q q C* = C + C WrsQrs Pr CPs Srs r= s= ( X V ) C = X C Q rs = X V - V V r - V V s - X = N = X V - V r V - V s V - X P r = X V - V V r - X = N = X V - V r V - X S rs = X V W = H { H} rs = { Hessan} rs N V = rsv X = X V - V rs V - X 57-

11 V r V = r V rs V = T = L q r ( LCL ) L q s a = W tr( TCP C) tr( TCP ), a = W tr( TCP CTCP ) a c rs r sc r= s= a + 6a =, h = g a e, = h h + ( g) ( h + ) a ( h + ) a ( h + ) a, g = d h + = + c h, c h g = h + v = h ( g), d λ = e( d-) q q rs r sc r= s= + c a ( ca ) ( ca ) c h + g = h + ( g) v c = e, Soluton Algorthms Methods for Fndng Lkelhood Solutons (Newton-Raphson, Fsher Scorng, MIVQUE, and Dfferental Evoluton) There are four technques n the procedure for determnng the maxmum lkelhood or restrcted maxmum lkelhood soluton (optmum): Newton-Raphson, Fsher Scorng, MIVQUE, and Dfferental Evoluton. The general steps for the Newton-Raphson, Fsher Scorng, and Dfferental Evoluton technques are (let θ be the overall covarance parameter vector):. Roughly estmate θ accordng to the specfed structure for each.. Evaluate the lkelhood of the model gven the data and the estmates of θ.. Improve upon the estmates of θ usng a search algorthm. (Improvement s defned as an ncrease n lkelhood.). Iterate untl maxmum lkelhood s reached, accordng to some convergence crteron. 5. Use the fnal θ estmates to estmate β. Newton-Raphson and Fsher Scorng The dfferences n the technques revolve around the ntal estmates n Step, and the mprovements n estmates made n Step. For the Newton-Raphson and Fsher Scorng technques, Step occurs as follows: a. Wth the estmated θ, compute the gradent vector g, and the Hessan matrx H. 57-

12 b. Compute d = -H - g. c. Let λ =. d. Compute new estmates for θ, teratvely, usng θ = θ - + λd. e. If θ s a vald set of covarance parameters and mproves the lkelhood, contnue to f. Otherwse, reduce λ by half and return to Step d. f. Check for convergence. If the convergence crtera (small change n -log-lkelhood) are met, stop. If the convergence crtera are not met, go back to Step a. The gradent vector g, and the Hessan matrx H, used for the Newton-Raphson and Fsher Scorng technques for solvng the REML equatons are shown n the followng table: REML Gradent (g) and Hessan (H) Technque Gradent (g) Hessan (H) Newton-Raphson g + g + g H + H + H Fsher Scorng g + g + g -H + H The gradent vector g, and the Hessan matrx H, used for the Newton-Raphson and Fsher Scorng technques for solvng the ML equatons are shown n the followng table: ML Gradent (g) and Hessan (H) Technque Gradent (g) Hessan (H) Newton-Raphson g + g H + H Fsher Scorng g + g -H where g, g, g, H, H, and H are defned as n Wolfnger, Tobas, and Sall (99). Defntons V V r =, A V V rs =, e y X β r r s V r = X X = X V V rv r * X = XK -, ( ) KK = X V X =, A = X V X, A = X V X = A, ( ) X = Pr N N = = C = A, Lkelhoods N l = ln V, = = Frst Dervatves = N N l, l = ln X V X = ln A = ln A N - e V e = - ( V V ) N l g = = tr r r N l g = = r ev V rv r = r e = = 57-

13 g r l = r = tr r [ H ] Second Dervatves H rs l = r l s = N { tr( V V rs ) tr( V V rv V s )} = rs r' s ( H H H ) Hrs = = r s H l rs = = tr r s rs r s ( H H H ) See Wolfnger, Tobas, and Sall (99), page 99, for detals. MIVQUE The MIVQUE estmates of θ n REML estmaton are found by solvng ( H θ = g. + H) The MIVQUE estmates of θ n ML estmaton are found by solvng H θ =. g See Wolfnger, Tobas, and Sall (99), page 6, for detals. Dfferental Evoluton The dfferental evoluton technques used n ths procedure for the ML and REML optmzaton are descrbed n Prce, Storn, and Lampnen (5). Ths algorthm s very slow, but t s also very robust. As t stands now, t s too slow to be used. However, as computers become faster, ths algorthm wll become more vable. Specfyng the Mnmum Detectable Dfference The four man parameters of a power analyss are the sample sze, the effect sze, the sgnfcance level, and the power level. Other extraneous parameters, such as the varance, must also be specfed. Ths secton descrbes the specfcaton of the effect sze, or mnmum detectable dfference (MDD) as we choose to call t n ths chapter. Power s defned as the probablty of rejectng the null hypothess of zero dfference when the actual dfference s a gven amount. As the sze of the actual dfference ncreases, so does the power. The MDD (or mnmum effect sze) s the smallest dfference among the populaton means that wll be detected by an experment at the specfed settngs of the other parameters. Typcally, a longtudnal desgn ncludes a between-subjects factor, a wthn-subject factor, and ther nteracton. A MDD must be specfed for each. As the number of factors grows, the number of nteractons grows, and the number of MDD s that must be specfed also grows. It becomes crucal that you specfy these values n a meanngful and accurate way. In the Repeated Measures module, PASS only requres the standard devaton of the group means. Unfortunately, ths s a quantty that researchers have very lttle experence wth. It seldom appears on any of the standard reports that are produced by commercal software. It s seldom present n the wrtten reports of analyses. A dfferent method s used to specfy the MDD n PASS. 57-

14 In ths routne, the MDD s specfed as the dfference between the smallest and largest effects. For example, suppose that a factor has three levels wth means, 5, 8. The detectable dfference s 8 = 8. Ths s a smply quantty that s easy to nterpret. The Effect Pattern for Factors When there are more than two means, the mnmum detectable dfference does not unquely defne a set of means that can be smulated. For example the followng sets of means all have dentcal MDD s, but the means themselves are qute dfferent: (,, 8), (,, 8), and (, 8, 8). The followng method for defnng the pattern s qute nformatve:. Set the frst (low) value to Set the last (hgh) value to.5.. Set each value n between to (Mean Mn) / MDD.5. Usng these steps, the three sets of means may be reduced to MDD and a pattern as follows: Orgnal MDD Pattern MDD x Pattern,, , -.5,.5 -, -,,, ,.,.5 -,,, 8, ,.5,.5 -,, Thus, each set of means can be easly reduced to two components: the MDD and a pattern. Ths s the method that PASS uses to supply the sets of means. Usng ths method, t s easy to compare varous szes of means. You smply enter dfferent values for the MDD, keepng the pattern the same. The Effect Pattern for Interactons Specfyng the structure of the nteractons s a lttle more problematc. Often, you are not nterested n the nteracton. When the nteracton s of nterest, you may have only a vague dea of ts structure. Part of your analyss wll be to nvestgate the effect of the nteracton, wth lttle or no knowledge of ts pattern beforehand. Specfyng the MDD for the nteracton s somewhat ntutve. The MDD defnes the largest dfference among the nteracton effects. A dffculty stll arses n that there s a very large number of possble patterns, many of whch are useful. For plannng purposes, we have decded to use a standard pattern n PASS. The nteracton pattern used n PASS s defned as the Kronecker product of the factor patterns that make up the nteracton, scaled so that the largest value s.5 and the smallest value s -.5. Example Suppose that a two-factor nteracton s made up of a three-level factor A wth a pattern of -.5,,.5 and a twolevel factor B wth a pattern of -.5,.5. The nteracton pattern would be found as follows. The Kronecker product of these two patterns s.5, -.5,.,., -.5,.5. Rescalng so that the mnmum s -.5 and the maxmum s.5 s acheved by doublng the values. The fnal nteracton pattern s.5, -.5,.,., -.5,.5. Ths pattern compares the dfference due to factor B across the levels of factor A. Suppose the MDD for A s set to 8, the MDD for B s set to 5, and the MDD for AB s set to. Suppose the overall mean s. The sx cell means would be found by addng the ndvdual effects as follows Term AB AB AB AB AB AB A B AB Total Overall Cell Mean

15 57-5 The cell means are then used to smulate the data for power and sample sze calculaton. Ths set of cell means has the MDD s and patterns specfed. Specfyng the Smulated Varance-Covarance Matrx As stated above, the varance of y s V = ZGZ' + R. In ths PASS module, ZGZ' s called the random component and R s called the resdual component. Snce V s block-dagonal (wth one block for each subject), t s specfed by specfyng the random and resdual components for one subject and repeatng those components for each subject. ZGZ' When the random component s ncluded, the model s called a random effects model. For a desgn wth four tme ponts, the structure of ZGZ' s [ ] J g g g g g g g g g g g g g g g g g = Ths structure only requres a sngle value: g. R The structure of R s qute flexble. Snce t s a varance-covarance matrx, the only stpulaton s that t must be non-negatve defnte. Possble choces for R when the varance s constant are Dagonal Constant (Compound Symetrc) AR() Lst Possble choces for R when the varance s allowed to vary are Dagonal Constant

16 AR() Lst ZGZ' + R When ZGZ' s ncluded and R s set to dagonal as recommended, the value of V becomes + g g g g g + g g g g g + g g g g g + g Ths s the compound symmetrc pattern that s assumed n the repeated measures analyss of varance (RMANOVA). Ths model s often used to compare LMM wth RMANOVA. Power Calculatons usng Computer Smulaton Computer smulaton allows us to estmate the power that s actually acheved by a test procedure n stuatons such as LMM that are not mathematcally tractable. Computer smulaton was once lmted to manframe computers. But, n recent years, as computer speeds have ncreased, smulaton studes can be completed on desktop and laptop computers n a reasonable perod of tme. The smulatons can stll be tme consumng, so we have proposed some steps below that wll sgnfcantly shortened then tme needed to obtan answers. The Smulaton It s mportant that you understand how the smulaton s setup. There are three man tabs (panels or wndows) that contan parameters that you wll need to set. These are the Data tab, the Covarance tab, and the Ftted Model tab. Data Tab The Data tab contans all the parameters assocated wth the sample sze, the effect sze, and the sgnfcance level (alpha). The effect sze parameters defne the expermental desgn. Covarance Tab The Covarance tab specfes the parameters used to defne the covarance of the data that s generated. Note that the covarance of the data you generate does not have to match the covarance model that you use to ft the data. In fact, snce you seldom know even the structure of the true covarance matrx, t s more realstc to generate data usng one type of covarance matrx and then ft the generated data wth a dfferent covarance structure. Ftted Model Tab The Ftted Model tab specfes the covarance matrx that s actually ft to your data. As stated above, the model does not need to concde wth the model used to generate the data. It may be more realstc f t does not. 57-6

17 Steps n Conductng a Smulaton Analyss Smulaton Steps The steps to a smulaton study are. Specfy the desgn that wll be studed. Enter the sample sze, MDD s, and varance covarance matrx. Specfy the covarance matrx of the model that s ftted to the data.. Generate random samples from the desgn specfed. Calculate the F-tests from the smulated data and determne f the null hypothess s accepted or rejected. Tabulate the number of rejectons and use ths to calculate the test s power.. Repeat step several hundred or more tmes, tabulatng the number of tmes the smulated data leads to a rejecton of the null hypothess. The power s the proporton of smulated samples n step that lead to rejecton.. Addtonally, you can run a separate smulaton to determne f the sgnfcance level (alpha) of the F-test matches the sgnfcance level you have selected. Ths s done by settng the MDD s to zero. Savng Smulaton Tme Smulatons for large models or large sample szes take several hours to run. The smulaton tme can be reduced by runnng the smulaton n two steps.. Specfy a reasonable range for the group sample szes. For example, you may want to try group sample szes of 5,,, 8, and. Set the number of smulatons to or 5. Although these s not a large enough smulaton sze to gve you defntve results, you can study the confdence ntervals for power provded n the reports and plots to determne a reduced range of sample szes.. Reduce the range of the sample sze values, ncrease the number of smulatons to or, and rerun the smulatons. These smulatons may run for a whle, so be prepared for runnng tmes of several mnutes or hours. The power values that come from these smulatons should be very precse. Generatng the Random Numbers The smulaton proceeds by generatng the normal random devates n groups that are the sze of the number of tme ponts. That s, a set of normals are generated for a sngle subject. Ths set of normals s transformed nto a set of normals havng the desred covarance structure usng the commonly know technque of multplyng the generated unt normals by the square root of the varance-covarance matrx. The square root s taken usng the Cholesk decompostons. The resultng response vector matrx has the desred covarance matrx. Symbolcally, suppose there are t tme ponts and further suppose that the desred varance-covarance matrx of the data to be smulated s gven by V. Fnd a matrx W such that V = WW'. Note that, by constructon, W s lower trangular. If we generate t unt random normal devates and place them n a vector z, the vector y = Wz has varance-covarance matrx WW' = V. Fnally, the approprate cell mean (based on the mnmum detectable dfferences) s added to the y to obtan the smulaton data wth the desred propertes. Once all of the data requred for a complete experment s generated, the mxed model s solved usng the mxed model algorthm coded for NCSS s mxed model procedure. 57-7

18 Procedure Optons Ths secton descrbes the optons that are specfc to ths procedure. These are located on the frst three tabs. To fnd out more about usng the Reports and Plot tabs go to the Procedure Wndow chapter. Desgn: Effect Sze Tab Ths tab contans most of the parameters and optons necessary to defne the sample sze, sgnfcance level, effect sze, and smulaton sze. Solve For Solve For Ths procedure always solves for the power, so there s no specfc Solve For opton. Sample Sze n (Subjects Per Group) Specfy one or more values for the number of subjects per group. The total sample sze s the sum of the ndvdual group szes across all groups. You can specfy a lst such as 6. The tems n the lst may be separated wth commas or blanks. The nterpretaton of the lst depends on the =n's check box. When the =n's box s checked, a separate analyss s calculated for each value of n. When the =n's box s not checked, PASS uses the n s as the actual group szes. In ths case, the number of tems entered must match the number of groups n the desgn, whch s equal to the product of the number of levels of A and B. You can also enter the sample szes n columns of the spreadsheet. The column contans the group sample szes, one per row. Columns are ndcated by addng an equals sgn to the left of the frst entry. For example, f you have entered a set of unequal group sample szes n column, you would enter =C here. Multple columns, such as columns through, are specfed as =C C C. = n s Ths opton controls whether or not the number of subjects per group s to be equal for all groups. When checked, the number of subjects per group s equal for all groups. A lst of values such as 5 5 represents three desgns: one wth fve per group, one wth ten per group, and one wth ffteen per group. A smulaton s conducted for each value. When ths opton s not checked, the n s are assumed to be unequal. A lst of values represents the sze of the ndvdual groups. For example, 5 5 represents a sngle, three-group desgn wth fve n the frst group, ten n the second group, and ffteen n the thrd group. If four values are needed, but only two are entered, the last value s carred forward. Alpha Alpha Ths opton specfes the probablty of a type-i error (alpha) for each factor and nteracton. A type-i error occurs when you reject the null hypothess of zero effects when n fact they are zero. Snce they are probabltes, alpha values must be between zero and one. Routnely, the value of.5 s used for alpha. Ths value may be nterpreted as meanng that about one F-test n twenty wll falsely reject the null hypothess. 57-8

19 Smulatons Smulatons Ths opton specfes the number of teratons, M, used n each smulaton. Larger numbers of teratons result n longer run tmes but more accurate results. The precson of the smulated power estmates can be determned by recognzng that they follow the bnomal dstrbuton. Thus, confdence ntervals may be constructed for power estmates. The followng table gves an estmate of the precson that s acheved for varous smulaton szes when the power s ether.5 or.95. The table values are nterpreted as follows: a 95% confdence nterval of the true power s gven by the power reported by the smulaton plus and mnus the Precson amount gven n the table. Smulaton Precson Precson Sze when when M Power =.5 Power = Notce that a smulaton sze of gves a precson of plus or mnus. when the true power s.95. Also note that as the smulaton sze s ncreased beyond 5, there s only a small amount of addtonal precson acheved. Because of the long run tme needed to obtan an estmate of power, t s crucal that you set ths parameter carefully. We suggest the followng two-step procedure. Step Set the number of smulatons to,, or 5 and run examples for a realstc range of sample szes. Study the results (especally the confdence ntervals) to determne a range of sample szes you want to nvestgate more carefully. Step Reduce the range of N to just one or two values and set the number of smulatons to a large amount. Step (Optonal) Take the rest of the day off the smulaton may take awhle! Effect Sze Specfy Effects Usng Indcate whch optons are used to specfy the Effect Sze. The possble choces are Means n Spreadsheet Columns or the reset of the optons on ths panel. Means n Spreadsheet Columns Specfy spreadsheet columns contanng a hypotheszed means matrx that represents the mnmum detectable dfferences among the means. Under the null hypothess, ths matrx s all zeros. The between-subject factors (A & B) are represented across the columns of the spreadsheet and the wthn-subject factors (C & D) are represented down the rows. The number of columns specfed must equal the number of groups, whch s equal to the number of levels n A tmes the number of levels n B. The number of rows must 57-9

20 equal the number of tme ponts, whch s equal to the number of levels n C multpled by the number of levels n D. For example, suppose you are desgnng an experment that s to have two between factors (A & B) and two wthn factors (C & D). Suppose each of the four factors has two levels. The columns of the spreadsheet would represent AB AB AB AB. The rows of the spreadsheet would represent CD CD CD CD Example To see how ths opton works, consder the followng table of hypotheszed means for an experment wth one between factor (A) havng two groups and one wthn factor (C) havng three tme perods. The values n columns C and C of the spreadsheet are C C By subtractng the approprate means, the followng table of effects results C C Means Effects Row Row Row Means Effects Effect Sze Factors Separatng Subjects nto Groups (Between) These optons specfy the effect szes of the between-subjects factors (A and B). Levels Specfy the number of levels (categores) n ths factor. Typcal values are from to 8. Set ths to a blank (or ) to gnore the factor n the desgn. Effects Pattern Ths opton specfes the pattern of the means for ths factor. Ths pattern s multpled by the Detectable Dfference to form the factor means used n the smulaton. For example, suppose that the pattern of a fvecategory factor s and the detectable dfference s. The resultng effects are The power reported by the smulaton s the value to detect a dfference of (5 ( 5)) when the pattern of means s a lnear growth pattern. Note that the power depends on both the pattern and the detectable dfference. Possble choces are dsplayed next usng examples that assume that the factor has fve categores. Lst of Means Enter a lst of means drectly nto the Lst of Means box to the rght. 57-

21 Lnear Up or Down Up: Down: Frst Effect Hgh or Low Hgh: Low: Last Effect Hgh or Low Hgh: Low: Frst Half Hgh or Low Hgh: Low: Zg Zag Hgh or Low Hgh: Low: Interacton Effect Patterns The effect patterns of nteractons are formed as the Kronecker product of the ndvdual factor patterns. Detectable Dfference Ths s the dfference between the largest and smallest effects assocated wth ths factor or nteracton. Ths represents the mnmum detectable dfference between any two levels (or factor-level combnatons). The actual means used n the smulaton are found by multplyng the Effects Pattern by ths value. Each value specfed here results n a separate smulaton. Note that, because of report and plot labelng, t s best f you only put multple values n one of these boxes at a tme. Otherwse, the plots wll be labeled usng the lessnformatve term Combnaton. Example Suppose that you selected Lnear Up as the Effects Pattern of a fve-level factor and set the detectable dfference to 8. Further suppose that the Baselne Mean (below) s set to. The resultng means for ths factor would be: 6 8. Note that the maxmum dfference between any two of these means s 8. Lst of Means You can specfy a lst of means drectly nstead of specfyng the pattern and detectable dfference. When you do so, the effects pattern and detectable dfference are calculated from the means you specfy. When you specfy too many values for the number of levels n the factor, the extra values are gnored. When you specfy too few values, the last value you specfy s coped forward. Usng Columns n the Spreadsheet You can specfy the lst of means n columns of the spreadsheet. When you do ths, you enter the column(s) name(s) here usng the equal sgn. For example, f you stored two sets of means, one n C5 and the other n C6, you would enter =C5 C6 here (or you could select the columns by pressng the button to the rght). Once you have entered values nto a spreadsheet, t s up to you to load that spreadsheet each tme you run the smulaton. Note that a separate smulaton s run for each column you specfy. 57-

22 Effect Sze Factors wth Multple Levels Wthn a Subject (Wthn) These optons specfy the effect szes of the wthn-subject factors (C and D). Levels Specfy the number of levels (categores) n ths factor. Typcal values are from to 8. Set ths to a blank (or ) to gnore the factor n the desgn. Note that the number of tme ponts s calculated as the product of the number of levels factors C and D. The rest of these optons behave as descrbed n the Between-Subject secton above. Possble choces are dsplayed next usng examples that assume that the factor has fve categores. Effect Sze Interactons These optons specfy the effect szes of any nteractons that are specfed. Interacton Check Box Check ths box to nclude the correspondng nteracton n the model. In most stuatons, you should nclude all approprate nteractons snce these have an mpact on the degrees of freedom of the F-tests. You can set ther Detectable Dfference near zero f you want to gnore them. Occasonally, you wll want to lmt the number of nteractons that you nclude n the model. Note that the model s forced to be herarchcal. Ths means that f you nclude the three-way nteracton ABC n the model, you must also nclude A, B, C, AB, AC, and BC. Detectable Dfference Ths s the dfference between the largest and smallest effects assocated wth ths term. Ths represents the mnmum detectable dfference between any two factor-level combnatons. The actual means used n the smulaton are found by multplyng the effects pattern by ths value. The effect pattern s created by formng the Kronecker product of the effect patterns of each factor n the nteracton. The result s standardzed so that the maxmum dfference n the pattern s.. Each tem n the pattern s multpled by the Detectable Dfference. The cell means are found by addng the approprate effects for the man effects, nteracton effects, and baselne mean. Each value specfed here results n a separate smulaton. Note that, because of report and plot labelng, t s best f you only put multple values n one of these boxes at a tme. Otherwse, the plots wll be labeled usng the lessnformatve term Combnaton. Desgn: Covarance Tab Ths tab specfes V(Y) = ZGZ + R whch s used to form the smulated data. Note that the covarance matrx specfed here does not have to match the covarance of the ftted model. Ths allows you to study the robustness of the ftted model to errors n the covarance specfcaton. There are two matrces that can be specfed: the random effects (G) and the resdual structure R. Group Opton Groupng Factor Specfy a groupng factor--ether Factor A or B. Ths causes a unque covarance matrx to be generated for each level of the selected factor. None No group extenson of the covarance matrx s done. Groupng s gnored n the formaton of G and R. 57-

23 A (or B) The structures of G and R are extended to nclude separate matrces for each level of factor A (or B). The number of groups s equal to the number of levels of A (or B). Indvdual values for each group are entered for g, ² for R, and. These ndvdual group values can be entered as a lst n the box or n a column of the spreadsheet. G (Covarance of Random Effects) Include G Specfy whether to nclude G (the varance of the random effects term u) n the covarance model. Not Checked Do not nclude G n the model, that s, all elements of G are zero. Hence, V(Y) s specfed usng only R. Checked Include the dagonal matrx G, and hence ZGZ, n the model. The dagonal elements of G are all set to g. Varances (Dagonal Elements of G) g (Subject ²) Ths s used to generate the dagonal elements of G (all off-dagonal elements are zero). It s the varance of a subject (random effect). Snce the value s a varance, t must be postve. It s usually obtaned from a prevous run of smlar data through a mxed model. When a Groupng Factor (A or B) s used, a separate value must be entered for each level of the group factor. Ths can be done by enterng a lst or by enterng the values n a column of the spreadsheet and specfyng that column here usng the =C type syntax. For example, f there were categores n factor A and factor A was selected as the Groupng Factor, you mght enter 5 n ths box or you could enter =C here and make a smlar entry n column C of the spreadsheet. Assumng that there are tme ponts, the G matrx would be

24 R (Covarance of Resduals) Specfy R Usng Specfy how you want to specfy R. Three choces are avalable: Varances Only Ths opton ndcates that R ncludes only dagonal elements (no autocorrelatons). Varances and Autocorrelatons Ths opton ndcates that R ncludes both dagonal (varance) and off-dagonal (autocorrelaton) parameters. R n Spreadsheet R s to be read n from the spreadsheet. Tmes Ths opton specfes the tme ponts at whch measurements of the subjects are made. Often, measurements are made at equ-dstant ponts through tme. But ths s not always the case. The number of tme ponts s the product of the number of levels of all wthn factors. The tme metrc nfluences the values of the varances as well as the correlatons between two measurements on the same ndvdual. The autocorrelatons are based on the formula: Corr(Y,Yj) = ^ ( t-tj ), where s the base autocorrelaton and t and tj are two ponts n the tme metrc lst. Ths formula allows you to easly specfy many dfferent types of autocorrelaton structures. The syntax for enterng the tme metrc s gven next: STEP START INC Measurements are made at tme ntervals of length INC, begnnng at START. For example, "STEP " would generate the tme seres:,,, 6, etc. RANGE MIN MAX A set of equal-spaced tme ponts s generated from the MIN value to the MAX value. Ths settng s very useful when you want to nvestgate the mpact of ncreasng/decreasng the number of measurements per subject durng the same perod of tme. For example, you mght want to determne f the study wll last fve weeks, wll the power of the statstcal tests ncrease f you take measurements rather than 5? LIST You may enter a lst of values separated by blanks or commas. For example, you could enter. f tmes were weeks, /7 week (day ), week, weeks, weeks. Varances (Dagonal Elements of R) Pattern Ths opton specfes how the dagonal (varance) elements of R are specfed. Possble choces are Constant n ² for R All varances are set to the value of ² for R. When a Groupng Factor s used, a lst of group varances can be entered n the box or read n from the spreadsheet usng the =C syntax to desgnate whch column. In ths case, each row of the spreadsheet provdes the varance for the correspondng group. 57-

25 Set of ² for R values proportonal to Tmes The values of ² to R range from the frst value below to the second value. The values between these two endponts are constructed so that they have the same relatve magntude as the Tmes entry. For example, suppose there are four levels of factor C (the wthn factor) and the Tmes values are,, 9,. Further suppose that the two values of ² for R are and. The dagonal elements of R wll be,, 9, and. When a Groupng Factor s used, lsts of the frst and last values of ² for R can be entered n the box or read n from the spreadsheet usng the same syntax descrbed above. For example, suppose there are levels n factor C and two n factor A. If the Groupng Factor s set to A, the entres n the frst box ² for R s 5 and the second s 7 8, the resultng value of R would be ² Lst A lst of varances, one per tme pont, s specfed. The lst can be entered n the box or specfed as rows of the spreadsheet usng the equals sgn, e.g. =C. ² for R (frst) The (frst) value of ² for R s specfed here. When a Groupng Factor s used, a lst of varances can be entered n the box or read n from the spreadsheet. The column contanng the group varance entres s specfed wth an equals sgn, e.g. =C. When ths opton s used, each row of the spreadsheet provdes ² for R for the correspondng group. ² for R (last) Ths s the fnal value n the specfcaton of the set of varances. It s used to specfy the fnal dagonal element n the lnear sequence. The ntermedate values of ² for R are calculated so that ther order and magntude are proportonal to the Tme values. When a Groupng Factor s used, a lst of varances can be read n from the spreadsheet. A column contanng the group varances s specfed wth an equals sgn, e.g. =C. When ths opton s used, each row of the spreadsheet provdes the fnal dagonal element of the correspondng group. An example s gven n the dscusson of Pattern above. ² Lst A lst of ² values, one per tme pont, s specfed here. The number of tems n the lst must match max(,levels of C) x max(,levels of D). The lst can also be specfed as a column of the spreadsheet usng the equals sgn, e.g. =C. 57-5

26 Autocorrelatons (Off-Dagonal Elements of R) Pattern Ths opton specfes the pattern of the autocorrelatons R. The three optons are Constant n All autocorrelatons are the same and equal to the value of. st Order:, ², ³,... The value of s used to generate a frst order autocorrelaton seres. Ths pattern reduces the magntude of the autocorrelaton at each successve step by multplyng the value at the prevous step by. Thus the pattern s, ², ³, etc. Note that the exponent n ths seres s equal to the dfference between the correspondng tme ponts. For example, f the tme ponts are,, 9, the successve exponents of are, 6, and. Lst A lst of values, separated by blanks or commas, s used to specfy the autocorrelaton pattern across the tme ponts wthn a subject. (Autocorrelaton) Ths s the value of between two measurements made on a subject at two tme ponts that dffer by one tme unt. A value near ndcates low correlaton. A value near ndcates hgh correlaton. Its use depends on the selecton n the Pattern opton. The possble values range from - to. However, n ths stuaton, a postve value s usually assumed, so the more realstc range s to. If Tme > Ths s the maxmum tme dfference between two measurement ponts before the autocorrelaton s set to the value n the box to the left. For example, you mght wsh to specfy a constant of. for the frst two tme perods, and then, for any two measures that are greater than two tme values apart, swtch to zero. If you have specfed a Constant autocorrelaton pattern, and set ths value to, the resultng autocorrelaton pattern of R would be When you want to gnore ths value, set t to a large number such as. Then = Ths s the second value. It s used when the tme dfference between two measurement ponts s larger than the value to the left. Lst Ths s a lst of autocorrelatons, one for each tme pont. The number of autocorrelatons must match the number of tme values whch s equal to the product of the number of levels for factors C and D. Snce ths s a type of correlaton, possble values range from - to. However, postve values are usually assumed, so the realstc range s to. A value near ndcates low correlaton. A value near ndcates hgh correlaton. 57-6

27 You can alternatvely enter the lst of autocorrelatons n a column of the spreadsheet and specfy that column here. When the program fnds an equals sgn n ths box, t reads n the values of the desgnated column. Blanks are converted to zeros. For example, f you have entered the autocorrelaton lst n column one, you would enter =C here. If there are not enough values entered, the last value s coped forward. R n Spreadsheet Columns Ths opton desgnates whch columns on the spreadsheet hold R. The number of columns and number of rows wth entres must match the number of tme perods at whch the subjects are measured. The matrx must be postve defnte. Press the button at the rght to select the columns from the spreadsheet. Ftted Model Tab Ths tab controls the varance-covarance model that s actual ft durng the soluton of each smulaton sample. G (Random Effects Component ) G (Random Effects Component) Specfy whether to nclude G (the random effects component n) the ftted model. Possble choces are None Do not nclude G n the ftted model. G (Random Effects for Subjects) Include G n the ftted model. In ths case, the R should be set to Dagonal (whch does not nclude autocorrelatons). Groups Specfy a groupng factor: ether factor A or B. When selected, a set of varance and autocorrelaton parameters are ft for each level of ths factor. R (Resdual Component) Pattern Specfy the type of R matrx (Resdual Component Pattern) to be generated. The default type s the Dagonal matrx. When terms are specfed n the Random Model, ths opton should be set to Dagonal. A bref summary of the varous structures follows. = Autocorrelaton = th autocorrelaton j = autocorrelaton between th and j th tme ponts ² = ² for th tme pont 57-7

28 57-8 Dagonal Compound Symmetry AR() Toepltz() e.g. Toepltz() = Banded() e.g. Banded() = Unstructured

29 Heterogeneous covarance structures allow for nonconstant values for e.g. Dagonal - Heterogeneous = Force Postve Covarances When checked, ths opton forces all values n the R and G matrces to be non-negatve. When ths opton s not checked, covarances can be negatve. Usually, negatve covarances are okay and should be allowed. However, some Resdual Component patterns such as Compound Symmetry assume that covarances (autocorrelatons) are postve.. Soluton Optons Lkelhood Type Specfy the type of lkelhood equaton to be solved. The optons are: MLE The Maxmum Lkelhood soluton has become less popular. REML (recommended) The Restrcted Maxmum Lkelhood soluton s recommended. It s the default n other software programs (such as SAS). Soluton Method Specfy the method to be used to solve the lkelhood equatons. The optons are: Newton-Raphson Ths s an mplementaton of the popular 'gradent search' procedure for maxmzng the lkelhood equatons. Whenever possble, we recommend that you use ths method. Fsher-Scorng Ths s an ntermedate step n the Newton-Raphson procedure. However, when the Newton-Raphson fals to converge, you may want to stop wth ths procedure. MIVQUE Ths non-teratve method s used to provde startng values for the Newton-Raphson method. For large problems, you may want to nvestgate the model usng ths method snce t s much faster. Dfferental Evoluton Ths grd search technque wll often fnd a soluton when the other methods fal to converge. However, t s panfully slow--often requrng hours to converge--and so should only be used as a last resort. 57-9

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...