Reports of the Institute of Biostatistics

Reports of the Insttute of Bostatstcs No 0 / 2007 Lebnz Unversty of Hannover Natural Scences Faculty Ttel: IUT for multple endponts Authors: Maro Hasler

Introducton Some of the focus n new drug development has been shfted to develop new medcnes whch may not necessarly be more effectve but have some other advantages compared to currently marked drugs, lke reducng toxcty. An applcaton, e.g., s to show safety of a new treatment on multple endponts compared to a reference. A rgorous clamng s to declare global safety f and only f each endpont s safe. Two-sded hypotheses are approprate for most endponts because a drecton of a harm effect s not known a pror. Ths s, each endpont both must not undershoot a gven lower lmt of the reference and must not overshoot a gven upper lmt of ths reference, respectvely. Because t s often hard to fx unform absolute safety thresholds jontly for all endponts, ratos (not dfferences) to control shall be consdered, too. The equvalence thresholds must be set a pror. But they are relatve, e.g. n percent, gvng an easy nterpretaton. For example, the new treatment wll be declared as safe f, for each endpont, not undershootng a lower lmt of 80% of the reference and not overshootng an upper lmt of 25% of ths reference, respectvely. Much work has been done on the assessment of boequvalence or therapeutc equvalence between two treatments on a unvarate endpont. But there s lmted research on the assessment of equvalence on multple endponts. The tradtonal way to treat ths problem, the ntersecton-unon-test (IUT), s known to be conservatve n many stuatons. Aganst the background of ths problem, the queston arses whether there are tests not havng ths weak pont. In fact, there are some mproved tests based on the IUT but most of them only hold for specal cases. On the other hand, dfferent approaches exst, lke the Hotellng s T 2 -test, usng a square sum test statstc for the dfferences n the means to show equvalence on multple endponts. A short recommendaton n lterature s: Bloch et al. [2], Berger and Hsu [], Casella and Berger [3], Hochberg and Tamhane [5], Wu et al. [8]. These tests ether do not explot the complete type I error - they have level α, not sze α - or they are not applcable for ratos. Lke the unon-ntersecton-test (UIT) for whch a multvarate t-dstrbuton can be derved for the global test statstc, the dea was to do the same for the ntersecton-unon-test (IUT). The tradtonal IUT becomes less conservatve for hgh correlatons and, hence, very conservatve for lower or negatve. A multvarate approach, takng correlatons nto account, was assumed to avod ths handcap. The expected advantage was to get a sze-α test ths way. 2 Unon-ntersecton and ntersecton-unon method 2. Unon-ntersecton method The unon-ntersecton method (UI) of test constructon mght be useful when the null hypothess can be convenently expressed as an ntersecton of a famly of hypotheses, ths s, H 0 H 0.

Suppose that a sutable test s avalable for each H 0 : θ Θ versus H : θ Θ c. We can then wrte H 0 : θ Θ. Say the rejecton regon for the test of H 0 s {x : T (x) R }. Hence, accordng to Roy (953), the rejecton regon for the unon-ntersecton test of H 0 s k {x : T (x) R }. Ths means that the global null hypothess H 0 s rejected f and only f at least one of ts component local null hypotheses H 0 s rejected. I.e., a new drug s tested and sad to be hazardous f at least one endpont s hazardous. Dependng on the test drecton the local rejecton regon for each of the ndvdual tests may be {x : T (x) > c}. wth a common c for each ndvdual test. The global rejecton regon of the UIT s k {x : T (x) > c} {x : max T (x) > c}.,...,k Thus, the test statstc for testng H 0 s T (x) max,...,k T (x). For the nverse test drecton, the local rejecton regon for each of the H 0 s Analogcal consderatons lead to the test statstc {x : T (x) < c}. T (x) mn T (x).,...,k 2.2 Intersecton-unon method In contrast to the unon-ntersecton method (IU) of test constructon the ntersecton-unon method s useful f the null hypothess can be convenently expressed as an unon of a famly of hypotheses, ths s, k H 0 H 0. 2

Agan, supposng that a sutable test s avalable for each H 0 : θ Θ versus H : θ Θ c we can then wrte k H 0 : θ Θ. The rejecton regon for the test of H 0 s {x : T (x) R }. Hence, the rejecton regon for the ntersecton-unon test of H 0 s {x : T (x) R }. Ths means that the global null hypothess H 0 s rejected f and only f each of ts component local null hypotheses H 0 s rejected. I.e., a new drug s tested and sad to be safe f each endpont s safe. Theorem: Let α be the sze of the test of H 0 wth rejecton regon R (,..., k). Then the IUT wth rejecton regon R k R s a level-α test, that s, ts sze s at most α wth α max,...,k α. Proof: Let θ k Θ. Then θ Θ for some and P θ (X R) P θ (X R ) α α. Suppose the test drecton for whch the local rejecton regon for each of the ndvdual tests s {x : T (x) > c} wth a common c for each ndvdual test. Then the global rejecton regon of the IUT s {x : T (x) > c} {x : mn T (x) > c}.,...,k And thus, the test statstc for testng H 0 s T (x) mn T (x).,...,k Agan, the nverse test drecton leads to the local rejecton regon for each of the H 0, ths s now, {x : T (x) < c}. q.e.d. And we obtan the test statstc T (x) max,...,k T (x). 3

3 Test procedure 3. Assumptons For,..., k and j,...,, let X j denote the outcomes for k endponts of an expermental treatment. Suppose that these random varables follow a k-varate normal dstrbuton wth mean vector µ X (µ X,... µ Xk ) and unknown covarance matrx Σ X. In the same manner, let the outcomes Y j of a reference treatment be k-varate normal dstrbuted wth parameters µ Y (µ Y,... µ Y k ) and Σ Y. Suppose that X j and Y j are mutually ndependent and Σ X Σ Y Σ. In ths way, the expermental and the reference treatment are presumed to have the same varaton per each sngle endpont. Let X ( X,..., X k ), Ȳ (Ȳ,..., Ȳk) and ˆΣ X, ˆΣ Y be the sample mean vectors and the sample covarance matrces for both treatments, respectvely, wth X j The pooled sample covarance matrx ˆΣ s gven by wth the elements X j, Ȳ j ˆΣ ( ) ˆΣ X ( ) ˆΣ Y 2 Y j. ˆσ j Ĉov j ( )Ĉov(X, X j ) ( )Ĉov(Y, Y j ) 2 (, j k) where Ĉov(X, X j ) and Ĉov(Y, Y j ) are the estmates for the covarances of the several endponts. Ths does not mean the same weghtng as Bloch et al. [2] do. But ths denotaton results n the fact that the dagonal elements then are ˆσ ˆσ 2 ( )S 2 X ( )S 2 Y 2 (,..., k) wth S 2 X j (X j X ) 2, S 2 Y j (Y j Ȳ) 2 whch are necessary n the followng test procedure. From the pooled sample covarance matrx ˆΣ, we then derve the estmaton of the common correlaton matrx of the data ˆR. The object s to compare the new expermental treatment wth the reference, and to consder t to be safe f each endpont s safe. Ths means an ntersecton-unon test. We frst observe the one-sded, later on, the equvalence test problem. 4

3.2 Test for dfferences n means The new expermental treatment s declared to be safe f and only f each endpont does not undershoot a gven fxed lmt of the reference. Ths results n the component local tests H 0 : µ X µ Y δ vs. H : µ X µ Y > δ () wth a relevant threshold δ. The global null hypothess of the underlyng ntersecton-unon test (IUT) s k H 0 H 0. Fgure shows the parameter space of a test for the case of k 2 endponts. The rejecton regon for the test of H 0 s {x, y : T (x, y ) > c}. wth the t-test statstcs T X Ȳ δ, ˆσ a common quantle c for each ndvdual test and the pooled estmators ˆσ 2 for σ2. Under the margnal assumptons of H 0, that s, µ X µ Y δ 0, the test statstcs T are t-dstrbuted wth 2 degrees of freedom. The global rejecton regon of the IUT s {x, y : T (x, y ) > c} {x, y : mn {T (x, y )} > c}.,...,k And thus, the test statstc for testng H 0 s T (x, y) mn {T (x, y )}. (3),...,k Under the margnal assumptons of all H 0 (the ntersecton of them), the test statstcs T approxmatvely follow a jont k-varate t-dstrbuton wth 2 degrees of freedom and a correlaton matrx dependng on the data s correlaton matrx, R. But because the global null hypothess s a unon - and not an ntersecton - of ts local hypotheses, the margn of ths global null hypothess s not unque whch would be necessary for dervng a jont k-varate t-dstrbuton under H 0. So, we, ndeed, have to take quantles c t ν, α of a unvarate t-dstrbuton. The decson rule s to reject H 0 and to conclude global safety f T (x, y) > t ν, α. (4) If safety s declared f and only f each endpont does not overshoot a gven fxed lmt of the reference, the component local tests are H 0 : µ X µ Y δ vs. H : µ X µ Y < δ (5) 5

Fgure : Parameter space of the test by dfferences for non-nferorty wth k 2 endponts. wth a relevant threshold δ. Fgure 2 shows the parameter space of a test for the case of k 2 endponts. The rejecton regon for the test of H 0 s {x, y : T (x, y ) < c} wth the t-test statstcs accordng to Equaton. The global rejecton regon of the IUT s {x, y : T (x, y ) < c} {x, y : max {T (x, y )} < c}.,...,k The test statstc for testng H 0 s now The decson rule now s to reject H 0 f whch corresponds wth T (x, y) < t ν, α. T (x, y) max,...,k {T (x, y )}. (6) T (x, y) < t ν,α (7) 6

Fgure 2: Parameter space of the test by dfferences for non-superorty wth k 2 endponts. Now, the new expermental treatment s declared to be safe f and only f each endpont both does not undershoot a gven fxed lower lmt of the reference and does not overshoot a gven fxed upper lmt of the reference, respectvely. Ths results n the component local tests for H 0 : µ X µ Y δ () or µ X µ Y δ vs. H : µ X µ Y > δ () and µ X µ Y < δ (8) wth relevant thresholds δ () < δ. The global null hypothess of the underlyng ntersecton-unon test s k k H 0 H 0 {H () 0 H 0 } wth H () 0 : µ X µ Y δ () and H 0 : µ X µ Y δ. The global test on equvalence s an IUT because the null hypothess can be expressed as a unon of a famly of hypotheses. Each local test tself s an IUT, too, because made up of two one-sded tests wth contrary drecton. In rewrtng H 0 by H 0 k H () 0 k H 0 H () 0 H 0, 7

we reorganze the test problem. H () 0 and H 0 represent two one-sded IUT now wth contrary drecton we have already focused. The test for the global H 0 s stll an IUT because the null hypothess s agan a unon of two hypotheses. Fgure 3 shows the parameter space of a test for the case of k 2 endponts. The rejecton regon for the test of H 0 s {x, y : T () (x, y ) > c () } wth the t-test statstcs T () {x, y : T (x, y ) < c }. X Ȳ δ (), T ˆσ X Ȳ δ ˆσ, (9) the quantles c () and c for the ndvdual test and the pooled estmators ˆσ 2 for σ2. Under the margnal assumptons of H () () 0, the test statstcs T are t-dstrbuted wth 2 degrees of freedom. Under the margnal assumptons of H 0, the test statstcs T are t-dstrbuted wth 2 degrees of freedom. From the consderatons above, t follows that the rejecton regon for ths IUT s {x, y : mn T (),...,k (x, y ) > c () } {x, y : max We now rewrte the test hypotheses of Equaton (8) as follows,,...,k T (x, y ) < c }. H 0 : µ X µ Y δ () or µ Y µ X δ vs. H : µ X µ Y > δ () and µ Y µ X > δ. (0) All the consderatons above stay the same but the par of test statstcs accordng to Equaton (9) changes nto T () X Ȳ δ () T ˆσ, T Ȳ X δ ˆσ, () The test statstcs T and now have converse test drectons and hence, T () and same. Herewth, the rejecton regon can be transformed nto {x, y : mn T () (x, y ) > c () } {x, y : mn T (x, y ) > c }.,...,k,...,k T have the As mentoned above, the test for the global H 0 s an IUT because the null hypothess s an unon of two hypotheses. But the local null hypotheses H () 0 and H 0 exclude each other. When H () 0 s true then H 0 can not. Hence, we can not assume both the margnal assumptons of H () 0 and H 0. There s no unque margn for the global null hypothess. The followng relatons can be shown, E(T () H 0 ) E( T H () 0 ) δ δ () σ, δ δ () σ. (3) 8

Fgure 3: Parameter space of a test by dfferences on equvalence for k 2 endponts, the alternatve hypothess H s an ntersecton of two one-sded alternatve hypotheses H () and H. These relatons are easy to see n wrtng the test statstcs T () and T n terms of each other, ) X Ȳ δ () δ δ ˆσ T () T T δ δ () ˆσ, Ȳ X δ δ () δ () ˆσ T () δ δ () ˆσ. (Ȳ X δ ˆσ ( ) X Ȳ δ () ˆσ δ δ () ˆσ δ δ () ˆσ Therefore, under the margnal assumpton of H () 0, the test statstc T follows a non-central unvarate t-dstrbuton wth 2 degrees of freedom, non-centralty parameter θ δ δ (). (4) σ 9

The test statstc T () follows the same dstrbuton but under the margnal assumpton of H 0. For ths reason, we need two test statstcs for testng H 0, namely { } { } T () (x, y) mn T () (x, y ), T (x, y) mn T (x, y ). (5),...,k,...,k Agan, under the margnal assumptons of all H () 0 (the ntersecton of them), the test statstcs T approxmatvely follow a jont k-varate t-dstrbuton wth 2 degrees of freedom and a correlaton matrx dependng on the data s one, R. But because of the sad reasons, one can not derve a jont k-varate t-dstrbuton under H 0. So, we, have to take quantles c t ν, α of a unvarate t-dstrbuton. The decson rule s to reject H () 0 f T () (x, y) > t ν, α. In the same manner, the decson rule s to reject H 0 f Safety can only be concluded f both T (x, y) > t ν, α. T () (x, y) > t ν, α and T (x, y) > t ν, α. (6) 3.3 Test for ratos of means Most of the results of the test for dfferences n means holds for the case of ratos, too. So, () chances nto H 0 : µ X ψ vs. H : µ X > ψ (7) µ Y µ Y wth a relevant threshold ψ. Fgure 4 shows the parameter space of a test for the case of k 2 endponts. The local rato-test statstcs are The test statstc for testng H 0 s T X ψ Ȳ. (8) ˆσ ψ2 T (x, y) The decson rule s to reject H 0 and to conclude global safety f mn {T (x, y )}. (9),...,k T (x, y) > t ν, α. (20) Correspondngly, (5) chances nto H 0 : µ X µ Y ψ vs. H : µ X µ Y < ψ. 0

Fgure 4: Parameter space of the test by ratos for non-nferorty wth k 2 endponts. Fgure 5 shows the parameter space of a test for the case of k 2 endponts. The local rato-test statstcs are the same as n Equaton (8). The test statstc for testng H 0 s now The decson rule s to reject H 0 f whch corresponds wth T (x, y) < t ν, α. T (x, y) max,...,k {T (x, y )}. (22) T (x, y) < t ν,α (23) When the new expermental treatment s declared to be safe f and only f each endpont both does not undershoot a gven relatve lower lmt of the reference and does not overshoot a gven relatve upper lmt of the reference, respectvely, (8) chances nto H 0 : µ X ψ () µ Y or µ X ψ µ Y H : µ X µ Y > ψ () and µ X µ Y < ψ (24) wth relevant thresholds ψ () < ψ. Fgure 6 shows the parameter space of a test for the case of k 2 endponts. The local rato-test statstcs are X ψ () ˆσ T () Ȳ ψ()2, T X ψ ˆσ vs. Ȳ ψ2. (25)

Fgure 5: Parameter space of the test by ratos for non-superorty wth k 2 endponts Fgure 6: Parameter space of a test by ratos on equvalence for k 2 endponts, the alternatve hypothess H s an ntersecton of two one-sded alternatve hypotheses H () and H. 2

They wll be rewrtten nto X ψ () ˆσ T () Ȳ ψ()2, T σ Ȳ ψ ˆσ ψ()2 X, (26) havng the same test drectons now. The followng relatons can be shown, ( ) ( ) µ E(T () H 0 ) ψ X ψ () µ Y, (27) E( T H () 0 ) ( ψ () ) µ Y σ ( ψ ) µ X. (28) These relatons are agan easy to see n wrtng the test statstcs T () and T n terms of each other, T () T X ψ () X ψ () ˆσ X ψ () ˆσ Ȳ ψ Ȳ ψ ˆσ Ȳ ψ ˆσ Ȳ Ȳ ψ ˆσ Ȳ Ȳ ψ ψ()2 Ȳ Ȳ ψ ψ()2 ψ()2 X X ψ () ˆσ X Ȳ ψ X X X X ψ () Ȳ X X ψ () Ȳ X Ȳ ψ ˆσ T X ψ()2 Ȳ X ψ () Ȳ ˆσ T () ˆσ ˆσ, ψ()2 X ψ () Ȳ ˆσ ˆσ ψ()2. ψ()2 ψ()2 3

Therefore, under the margnal assumpton of H () 0, the test statstc T follows a non-central unvarate t-dstrbuton wth 2 degrees of freedom, non-centralty parameter ( ) ( ) ψ () θ µ Y µ ψ X. (29) σ Under the margnal assumpton of H 0 non-centralty parameter θ (), the test statstc T () ( ) ( µ ψ Xk k σ k ψ () k ψ()2 k follows the same dstrbuton but wth ) µ Y k. (30) For ths reason, we now have two test statstcs for testng H 0 as follows, { } { } T () (x, y) mn T () (x, y ), T (x, y) mn T (x, y ). (3),...,k,...,k The decson rule s to reject H 0 and to conclude safety f both T () (x, y) > t ν, α and T (x, y) > t ν, α. (32) 3.4 α-smulatons Smulaton studes were performed for 2, 4, 8 and 20, 40, 80 endponts wth several means and varances. For each fxed number of endponts k {2, 4, 8, 20, 40, 80}, dfferent grades of correlaton were consdered: maxmal negatve correlaton, correlaton 0, correlaton 0.5 and maxmal correlaton. For each fxed k and grade of correlaton, the endponts were equcorrelated, ths s ρ j ρ for all j k. Note that the negatve correlatons n the left column are bounded below by ρ mn k. 00000 smulaton runs were taken for 2, 4, 8 endponts, 0000 for 20, 40, 80 endponts. Each smulaton result was obtaned usng a program code n the statstc software R [7] and applyng the package mvtnorm by Genz and Bretz [4]. The am to show that usng related quantles of a k-varate t-dstrbuton to obtan a sze-α test, was not acheved. Indeed, ths approach leads to an exact sze α but only for the ntersecton of all margns of local null hypotheses k H 0. Quantles of a unvarate t-dstrbuton result n very conservatve decsons for that stuaton. But for the case of nterest (the unon k H 0 H 0 ), the unvarate method keeps the α-level conservatvely, whle the multvarate qute fals. An example s: 4 endponts wth correlaton 0, one-sded testng (non-nferorty), balanced sample sze 00, coeffcent of varaton 0.25, µ Y (0.,, 0, 00), µ X (0.079,.0, 0, 00), δ ( 0.02, 0.20, 2.00, 20.00). That means that µ X s nferor (unsafe), the others are non-nferor (safe). Because not each endpont s safe, global safety could not be declared. The related type I errors are: 0.37 (multvarate method) and 0.03 (tradtonal unvarate IUT). 4

4 Dscusson One possblty to show boequvalence or therapeutc equvalence between two treatments on multple endponts s usng an IUT for ether dfferences n means or ratos of them. Ths then yelds a global tests whch rejects f and only f each local test rejects. E.g., a new expermental treatment s declared to be safe f each endpont s safe, and safety s defned n not under-/ overshootng a gven fxed lmt of a reference. The IUT s known to be very conservatve n many stuatons. One reason s that t does not take any correlatons nto account. Each endpont wll be tested separately usng quantles or p-values from unvarate t-dstrbutons. Another reason s the nature of the margn of null hypothess, H 0. So, the am was to extend the IUT to a multvarate approach lke the UIT usng a multvarate t-dstrbuton nstead of a, say Bonferron, adjustment. The concluson so far s that there s no such easy equvalent multvarate-t approach for the IUT. The studed one does not keep the α-level for the complete space of the null hypotheses. Another noteworthy fact s that an IUT for showng equvalence between two treatments on multple endponts always comes to a global decson. All endponts together are equvalent or not. If not, omttng the hazardous endponts does not synonymously mean the equvalence of the remanng ones. To demonstrate equvalence on a subset of endponts (at least,..., k of k) and to dentfy those, the procedure of Quan et al. [6] s an approprate soluton, for example. 5

References [] R.L. Berger and J.C. Hsu. Boequvalence trals, ntersecton-unon tests and equvalence confdence sets. Statstcal Scence, (4):283 39, 996. [2] D.A. Bloch, T.L. La, and P. Tubert-Btter. One-sded tests n clncal trals wth multple endponts. Bometrcs, 57:039 047, 200. [3] G. Casella and R.L. Berger. Statstcal Inference. Duxbury, Thomson Learnng, 2002. [4] A. Genz, F. Bretz, and R port by T. Hothorn. mvtnorm: Multvarate Normal and t Dstrbuton, 2006. R package verson 0.7-5. [5] Y. Hochberg and A. C. Tamhane. Multple Comparson Procedures. John Wley and Sons, Inc., New York, 987. [6] Hu Quan, Jm Bolognese, and Weyng Yuan. Assessment of equvalence on multple endponts. Statstcs n Medcne, 20:359 373, 200. [7] R Development Core Team. R: A Language and Envronment for Statstcal Computng. R Foundaton for Statstcal Computng, Venna, Austra, 2007. ISBN 3-90005-07-0. [8] Y. Wu, M.G. Genton, and L.A. Stefansk. A multvarate two-sample mean test for small sample sze and mssng data. Bometrcs, 62:877 885, 2006. 6