Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally distributed with mea ad variace. A hypothesis test havig oe-sided type I error might be based o a critical fuctio which rejects H : i favor of alterative hypothesis H : whe ( ) ( ) Φ Φ u du e Z where ; π. The power fuctio for this hypothesis test is the ( ) Φ Pwr Pr Pr This power fuctio ca be used to compute the power β with which the hypothesis test rejects a specific alterative whe the sample sie is at some give value of ; compute the sample sie for which a hypothesis test would have prescribed power β to detect a specific desig alterative ; or compute the alterative which is rejected with prescribed power β whe performig the hypothesis test with some give sample sie. For istace, whe desirig to compute a sample sie such that the hypothesis test has power β, we merely wat ( ), Pr β Φ Pwr which the suggests ( ) ( ) β β β +. Aother approach to sample sie estimatio is based o the precisio with which some parameter ca be estimated. For istace, a (-)% cofidece iterval for might be computed as., + If we wat the width of the cofidece iterval to be - (so the CI will discrimiate betwee the ull ad alterative hypotheses), the we use sample sie formula ( ) ( ), + which correspods to the same sample sie formula as derived from the hypothesis test, providig we choose power β (my religio). 5.4.

Modified ample ie Formula i the Presece of a Mea-ariace Relatioship uppose a statistic is kow to be ormally distributed with mea ad variace (). I this case, the variace of the distributio of depeds upo the mea of that distributio a mea-variace relatioship. The above formulas eed to be modified whe the variace of the ormally distributed statistic depeds o the mea. As a geeral rule, most statisticias igore this issue because either ) the sample sie will be such that the variace will ot differ by very much over the rage of alteratives for which the power is, say, betwee % ad 99%, or ) the sample sie computatio is based o such crude estimates of the variability of the data that ay error due to igorig the meavariace relatioship is egligible, or 3) both. Nevertheless, for completeess I preset the modified formulas here for mea-variace relatioships. I these formulas, I presume that the power fuctio is higher at tha at ay <. I ote that there are sample sie requiremets i order to guaratee that the power curve achieves its maximum over the ull hypothesis regio at this boudary betwee the ull ad alterative hypotheses. This requiremet is that for all <, we must have ( ) ( ). This is clearly satisfied whe () is a icreasig fuctio of, because i that case, the umerator is egative. uppose a statistic is kow to be ormally distributed with mea ad variace (). A hypothesis test havig oe-sided type I error might be based o a critical fuctio which rejects H : i favor of alterative hypothesis H : whe Z u Φ e du. ( ) ; where Φ( ) ( ) π The power fuctio for this hypothesis test is the Pwr ( ) Pr Φ ( ) ( ) ( ) Pr. ( ) ( ) ( ) ( ) ( ) This power fuctio ca be used to compute the power β with which the hypothesis test rejects a specific alterative whe the sample sie is at some give value of ; compute the sample sie for which a hypothesis test would have prescribed power β to detect a specific desig alterative ; or compute the alterative which is rejected with prescribed power β whe performig the hypothesis test with some give sample sie. For istace, whe desirig to compute a sample sie such that the hypothesis test has power β, we merely wat 5.4.

Pwr ( ) Pr Φ β, ( ) ( ) ( ) ( ) which the suggests ( ) ( ( ) + ( ) ) β β β. ( ) ( ( ) ) As before, the choice of power β (my religio) correspods exactly to choosig sample sie accordig to the precisio with which some parameter ca be estimated as judged by a (-)% cofidece iterval for. Whe ivertig the above power ador sample sie formulas to fid the alterative for which a desig has prescribed power, it may be the case that a iterative search is ecessary. Geeral ample ie Formula for -sample, -sample, ad Regressio ettigs The +eqtrial Techical Overview describes a geeral sample sie formula which ca be used i the data aalysis models most commoly used i the aalysis of cliical trial data. (The otatio i this documet differs slightly from the otatio used i the techical overview.) I these models, we let θ represet the measure of treatmet effect, which is most ofte a cotrast (differece or ratio) of some withi group summary measure µ computed idepedetly for each treatmet arm. tatistical aalysis ca usually be based o a estimate of the treatmet effect θ. Most ofte, either the estimate of θ or the logarithmic trasformatio of θ are approximately ormally distributed i a fixed sample study (i.e., oe without iterim aalyses). We thus let g(θ) be the trasformed treatmet effect measure which is commoly estimated with a approximately ormally distributed estimate. The lik fuctio g( ) is typically the idetity fuctio (so θ) or the logarithmic trasformatio (so log (θ)). We thus assume that the estimate of is approximately ormally distributed with ˆ ~ N, where is the (average) variability cotributed to the estimate by a sigle observatio, ad is the sample sie. I geeral, ca be a fuctio of the withi group summary measures µ, as well as other uisace parameters that are idepedet of µ. I the rest of this documet, we igore ay mea-variace relatioship. Whe implemetig these formulas, it will geerally be ecessary to decide whether to make calculatios usig the value of uder the ull, alterative, or some itermediate hypothesis. uppose we are iterested i discrimiatig betwee a ull hypothesis H : ad a alterative hypothesis H : i a hypothesis test havig oe-sided type I error ad statistical power β. Whe the above approximate distributio holds, sample sie computatios are most ofte effected usig δ β 5.4.

where ad δ β is a stadardied alterative, which i a fixed sample study (i.e., oe without iterim aalyses) is δ β - + β. The same geeral formula ca be used i a group sequetial test, providig the estimate of treatmet effect ca be viewed as a weighted sum of ucorrelated, approximately ormally distributed statistics computed o the groups accrued betwee aalyses. This is ofte referred to as idepedet icremet structure, ad this holds i a wide variety of commo cliical trial settigs. I these group sequetial settigs, the stadardied alterative must be computed usig recursive umerical itegratio of covolutios of desities. (+eqtrial will do this for us.) Use of the Geeral Formula i Commo -sample Aalysis Models. Testig meas of cotiuous distributios: Y i ~ (µ, σ ), i,, θ µ θ σ. Testig geometric meas of cotiuous distributios: log Y i ~ (µ, σ ), i,, θ e µ log ( θ ) σ 3. Testig proportios of Beroulli distributios: Y i ~B (,µ), i,, θ µ θ p(-p) Use of the Geeral Formula i Commo Idepedet ample Aalysis Models. Testig meas of cotiuous distributios: Y ki ~ (µ k, σ k ), i,, m k ; k, m + m Radomiatio ratio r m m θ µ µ (differece of meas) θ (r+) [ σ r + σ ]. Testig geometric meas of cotiuous distributios: log Y ki ~ (µ k, σ k ), i,, m k ; k, m + m Radomiatio ratio r m m θ exp ( µ ) exp (µ ) exp ( µ µ ) (ratio of geometric meas) log ( θ ) (r+) [ σ r + σ ] 5.4.

3. Testig proportios of Beroulli distributios: Y ki ~B (, µ k ), i,, m k ; k, a. m + m b. Radomiatio ratio r m m c. θ µ µ (differece of proportios) d. θ e. (r+) [ p ( - p ) r + p ( p ) ] (uder the alterative) 4. Testig odds of Beroulli distributios: Y ki ~B (, p k ), i,, m k ; k, a. m + m b. Radomiatio ratio r m m c. Odds µ k p k ( p k ) d. θ µ µ (odds ratio) e. log ( θ ) f. (r+) [ ( r p ( - p ) ) + ( p ( p ) ) ] (uder alterative) 5. Testig haard ratios i survival distributios: Y ki ~ k (t ), i,, m k ; k, umber of observed evets i both groups combied Radomiatio ratio r m m Haard fuctio h k (t ) - d ( log k (t ) ) θ h (t ) h (t ) (costat ratio of haard fuctios) log ( θ ) (r+) [ r + ] (uder the ull) Use of the Geeral Formula i Whe Comparig Meas with Correlated Observatios. ( Repeated Measures ): uppose the kth treatmet group (k,) has m k idepedet subjects, each of whom have J measuremets, ad subjects i differet groups are idepedet Y kij ~ (µ k, σ k ), k,, i,, m k ; j,, J corr(y kij, Y k i j )ρ if kk, ii, j j corr(y kij, Y k i j ) if kk, ii, jj corr(y kij, Y k i j )ρ if k k or i i Radomiatio ratio r m m θ µ µ (differece of meas) θ (r+) { σ [+(J-)ρ](J r) + σ [+(J-)ρ]J }. ( Crossover ): uppose m idepedet pairs of subjects are radomied such that oe member of each pair is i treatmet group ad oe is i treatmet group. Y ki ~ (µ k, σ k ), k,, i,, m corr(y ki, Y k i )ρ if k k, ii corr(y ki, Y k i ) if kk, ii corr(y ki, Y k i )ρ if i i θ µ µ (differece of meas) 5.4.

θ { σ + σ - ρσ σ } Use of the Geeral Formula i Commo Regressio Aalysis Models. Liear regressio (meas): ( Y i X i x i ) ~ (β + β x i, σ ), i,, θ E( Y X x+ ) - E ( Y X x ) β (liear cotrast of meas) θ σ ar ( x ). Liear regressio o log trasformed data (geometric meas): ( log Y i X i x i ) ~ (β + β x i, σ ), i,, θ GM( Y X x+ ) GM ( Y X x ) exp ( β ) log ( θ ) σ ar ( x ) 3. Logistic regressio (odds): Y ki ~B (, p k ), i,, m k ; k, a. m + m b. Radomiatio ratio r m m c. Odds µ k p k ( p k ) d. θ µ µ (odds ratio) e. log ( θ ) f. [ p ( p) ar ( x ) ] (usig a average value for p) 4. Proportioal haards regressio (haard ratios): Y i ~ i (t ), i,, umber of observed evets i both groups combied Haard h i (t X i x i ) - d ( log i (t X i x i ) ) h (t ) exp (β x i ) θ h(t X x +) h i (t X x ) exp (β ) log ( θ ) ar ( x ) (uder the ull) ample ie Formula for K-sample ettig The +eqtrial Techical Overview also provides a sample sie formula appropriate whe comparig meas or geometric meas across K idepedet samples i a fixed sample (o iterim aalyses) settig. I this settig, we agai use some cosider some withi group summary measure µ k computed idepedetly for the kth treatmet arm, k,,k. The ull ad alterative hypotheses are classically stated as H : µ µ µ K ad H : µ i µ j for some i,j. Testig of the hypotheses is geerally based o the variace of the withi group summary measures. That is, the parameter measurig treatmet effect is θ ar((µ, µ,, µ K )). Whe all groups have equal summary measures, this variace is. Whe the alterative hypothesis is true, the variace across the group summary measures is oero. 5.4.

The exact formula ad code used to compute sample sies i the K-sample settig is give i the techical overview. Usig +eqtrial to Compute Number of Evets for Proportioal Haards Models +eqtrial provides explicit fuctios for the computatio of sample sies i the two sample settig for both fixed sample ad group sequetial trials usig the proportioal haards model. Although o explicit facility is provided for proportioal haards regressio with a cotiuous predictor, examiatio of the results give above for the geometric mea ad haard ratio regressios reveals a similarity of the formulas. I fact, we merely eed to use the geometric mea model with a stadard deviatio of i order to estimate the umber of observed evets eeded for the proportioal haards model. This also suggests that whe plaig to use the K-sample lograk statistic, we ca merely use the geometric mea model i order to fid the umber of evets eeded to provide desired power. I this case, we ca use the commad lie fuctios (there is a bug i the dialog) to provide a vector of haard ratios across the K groups. All haard ratios should be specified relative to the cotrol group, ad it will be ecessary to iclude a haard ratio of reflectig the compariso of the cotrol group to itself. Computig the Number of ubjects to Accrue to a urvival tudy The above sample sie formulas for proportioal haards models provide the umber of evets eeded, rather tha the umber of subjects. everal approximate approaches are used to determie the umber of subjects to accrue:. Assume that subjects are accrued uiformly over, say, (,a), ad that data aalysis will occur at time τ+a. Further assume expoetial survival distributio (a costat haard) i each group. We ca the derive the probability of a subject havig a evet by the time of aalysis, ad by dividig the umber of evets by that probability, derive the umber of subjects to accrue. (see +eqtrial Techical Overview).. Uder the same assumptios, use the rate of observed evets ad the average time of follow-up i a Poisso type model. 5.4.