SIMULTANEOUS CONFIDENCE BOUNDS WITH APPLICATIONS TO DRUG STABILITY STUDIES. Xiaojing Lu. A Dissertation

Size: px
Start display at page:

Download "SIMULTANEOUS CONFIDENCE BOUNDS WITH APPLICATIONS TO DRUG STABILITY STUDIES. Xiaojing Lu. A Dissertation"

Transcription

1 SIMULTANEOUS CONFIDENCE BOUNDS WITH APPLICATIONS TO DRUG STABILITY STUDIES Xiaojing Lu A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 26 Committee: John Chen Sub Ramakrishnan Graduate Faculty Representative Gabor Szekely Truc Nguyen

2 ii ABSTRACT John T. Chen, Advisor The focus of this research was to develop simultaneous confidence bounds for all contrasts of several regression lines with a constrained explanatory variable. The pioneering work of Spurrier provided a set of simultaneous confidence bounds for exact inference on all contrasts of several simple linear regression lines over the entire range (, ) using the same n design points. However, in many applications, the explanatory variables are constrained to smaller intervals than the entire range (, ). Spurrier clearly stated in the article (JASA, 999) that the inference problem becomes much more complicated when the explanatory variable is bounded to a given interval. In fact, Wei Liu et al. (JASA, 24) have investigated this issue, but were unable to solve the problem. Instead, they were obliged to rely on simulation based methods which produced approximate probability points for simultaneous comparisons. A noted criticism of their method is that the results are not exact and the simulations must be repeated for each application. In this research, a set of simultaneous confidence bounds for all contrasts of several linear regression lines was constructed for when the explanatory variable is restricted to a fixed interval, [ x, x ], where x is a predetermined constant. These results greatly improve those of Spurrier since restricting the explanatory variable to a smaller interval results in narrower confidence bounds. Further, since the methods of this research are exact, they are superior to the earlier work of Wei Liu et al. A significant area of this research concerned a certain statistic that plays a crucial role in constructing confidence bounds with a constrained explanatory variable, and a pivotal

3 iii quantity that aids in the discovery of critical values for determining the confidence bounds. The pivotal quantity is the maximum value of an associated function, and the statistic is the cut-off point at which the function is optimized. It is of primary importance to find a closed-form expression for the pivotal quantity and to derive its exact distribution. In this research, both of these problems were solved. In addition, the exact distribution of the statistic was found to be a standard Cauchy distribution; in fact, amazingly, it has also been shown that the statistic is independent of the pivotal quantity. These research results shed surprising new light on long standing knotty problems in biostatistics. Applications of this method to drug stability studies were examined. In situations where multiple batches of a drug product are manufactured, it is desired to pool data from different batches to obtain a single shelf-life for all batches. This research provided a new pooling method that was demonstrated to be more versatile and efficient than the existing pooling procedures.

4 iv ACKNOWLEDGMENTS I want to express my sincere appreciation to my advisor, Dr. John T. Chen, for bringing me to this fertile area of Statistics, for his patience and encouragement during the manuscript preparation, and for his excellent guidance and support through the completion of this research. I would like to thank all of the members of my committee, Dr. Gabor Szekely, Dr. Truc Nguyen, and Dr. Sub Ranakrishnan for their advice and helpful comments. I have additionally benefitted from insightful suggestions by Dr. Hangfeng Chen. I would also like to extend my gratitude to the staff in the Department of Mathematics and Statistics at Bowling Green State University for their assistance; they include Cyndi Patterson, Marcia Seubert, and Mary Busdeker. Finally, I owe very special thanks to my husband, G. Jay Kerns, who patiently proofread my manuscript, gave many useful comments, and supported me in my educational pursuit.

5 v TABLE OF CONTENTS CHAPTER : INTRODUCTION TO MULTIPLE COMPARISONS. Introduction Simultaneous Statistical Inference Multiple Comparison Inference Multiple Comparison With a Control Multiple Comparison With the Best All-pairwise Comparisons All-contrast Comparisons An Introduction to Multiple Comparisons in General Linear Model Stepwise Hypothesis Tests Introduction to Drug Stability Studies CHAPTER 2: SPURRIER S EXACT CONFIDENCE BOUNDS 7 2. Introduction Setting of the Problem A Closed-form Expression for the Pivotal Quantity The Exact Distribution of the Pivotal Quantity Comparison of Scheffé s Bounds and Spurrier s Bounds Example CHAPTER 3: CONFIDENCE BOUNDS OVER A FIXED INTERVAL Introduction

6 vi 3.2 A Closed-form Expression for the Pivotal Quantity The Exact Distribution of the Variable V Distribution Theory The Exact Distribution of the Pivotal Quantity CHAPTER 4: DRUG STABILITY STUDIES WITHOUT TIME EFFECTS Introduction FDA s Pooling Procedures Pooling Procedures Using MCB Examples CHAPTER 5: DRUG STABILITY STUDIES WITH TIME EFFECTS 8 5. Introduction Evaluating Drug Stability Using an Arrhenius Model Liu et al. s Procedure for Pooling Batches Pooling Batches Based on Confidence Bounds for All-Contrast Comparisons 9 REFERENCES 97 Appendix A: MATLAB PROGRAMS FOR FIGURES 4. AND A. Matlab Program for Figure A.2 Matlab Program for Figure Appendix B: MATLAB PROGRAM FOR SIMULATING Q AND V 5

7 vii LIST OF TABLES 3. Simulation Results for M 3 iterations Listing of Data Set ANCOVA Results for Testing Batch Slopes (Data Set ) Individual Batch Regressions (Data Set ) MCW 75% and 95% CI s (Data Set ) Listing of Data Set ANCOVA Results for Testing Batch Slopes (Data Set 2) Individual Batch Regressions (Data Set 2) MCW 75% and 95% CI s (Data Set 2) Calculated Shelf-lives for Each Decision Rule Potency Assay Results (% of claim) Confidence Bounds and Maximum Distances for Different Contrasts... 96

8 viii LIST OF FIGURES 3. Plots of T against a Scatterplot and Chiplot of Q and V Scatterplot and Chiplot of Q and V Stability Data and Individual Regression Lines (Data Set ) Stability Data and Individual Regression Lines (Data Set 2) Assay Results and Individual Regression Lines Confidence Bounds for Expected Percent of Label Claim for Batch Minus Expected Percent of Label Claim for Batch

9 CHAPTER INTRODUCTION TO MULTIPLE COMPARISONS. Introduction This chapter gives an introduction to different types of multiple comparison methods and multiple comparisons in the general linear model. We will begin this chapter by discussing (in Section.2) simultaneous statistical inference. Simultaneous statistical inference is statistical inference on treatment means themselves, and is inference on several treatment means simultaneously. In Section.3, we will introduce the multiple comparisons inference, which is simultaneous inference on certain functions of the differences of the treatment means. This section describes four types of multiple comparison inference: multiple comparison with the control (MCC), multiple comparisons with the best (MCB), all-pairwise comparisons (MCA), and all-contrast comparisons (ACC). For each multiple comparison inference, a few well-established methods will be stated. Section.4 gives an introduction to multiple comparisons in the general linear model, which is the focus of this research. Multiple comparison inference, covered in Section.3, is appropriate in experiments where the response measured depends on the treatments only. However, in many situations, there exist one or more covariates that have some impact on the response. In such cases, multiple comparisons of treatments in terms of a parametric function is more meaningful than in terms of their means. Section.5 presents stepwise procedures for multiple hypothesis testing. Bonferroni s procedure, Holm s step-down procedure (979), and Hochberg s stepup procedure (988) will be described in this section. In Section.6, we will give a brief introduction to applications of multiple comparison procedures in drug stability studies.

10 2 Often times, the purpose of studies is to compare several treatment effects by estimating differences or by testing a family of hypotheses. The control of the Type I Error rate when testing simultaneously a family of hypotheses is a central issue in the area of multiple comparisons. The traditional familywise error rate (FWER) criterion (Hochberg and Tamhane (987)) which states that the probability of one or more Type I Errors should be kept at or less than a pre-specified level α is often adopted to control the multiplicity effect. A criticism of the FWER criterion is that it lowers the power of detecting real effects due to the stringent requirement on Type I Errors. Benjamini and Hochberg (995) suggested the false discovery rate (FDR) criterion, which is a different point of view for how the errors in multiple testing could be considered. The FDR criterion is designed to control the expected proportion of errors among the rejected hypotheses, and it has the following two properties in comparison with the FWER: if all null hypotheses are true, the FDR is equivalent to the FWER; otherwise, the FDR is less than or equal to the FWER. As a result, any procedure that controls the FWER also controls the FDR, and if a procedure is designed to control the FDR only, a potential gain in power may be expected. Multiple comparisons procedures are one of the most frequently applied statistical methods in practice. For example, consider a pharmaceutical company that compares several batches of a drug for the purpose of estimating the shelf-life of the drug (see Example in Chapter 4). In this example, six batches of the drug are manufactured, and it is desired to pool data from different batches to estimate a single shelf-life for the drug. The use of a multiple comparisons procedure is called for in this situation to make correct inferences, in particular, to determine an appropriate expiration date of the drug. In Chapter 4, we will talk about this example in detail.

11 3.2 Simultaneous Statistical Inference For simultaneous statistical inference, one useful statistical model is the one-way model, which is given by Y ij µ i + ɛ ij, i,..., k, j,..., n i, (.2.) where Y ij is j th response for the ith treatment, µ, µ 2,..., µ k are the treatment means under k treatments, and ɛ,..., ɛ knk are independent and identically distributed normal errors with mean and variance σ 2 unknown. Let n i ˆµ i Y ij /n i, j and ˆσ 2 k n i (Y ij ˆµ i ) 2 /ν with ν j k (n i ) denote the sample mean and the pooled sample variance, respectively. One type of simultaneous statistical inference is inference on the treatment means µ, µ 2,..., µ k themselves. For example, we might be interested in constructing k simultaneous confidence intervals, { µ i (L i, U i ), i,..., k }, at the ( α)% confidence level. It is understood that the fact that each individual two-sided confidence interval for µ i has coverage probability ( α)% does not guarantee the overall coverage probability to be ( α)%. In order for the inferences to be correct simultaneously with a probability of ( α)%, we must adjust for multiplicity to ensure that each individual inference on µ, µ 2,..., µ k is correct with a probability somewhat higher than ( α)%. There are several methods of adjusting for multiplicity, which we will describe briefly here.. The Studentized maximum modulus method (SMM). This method is based on the

12 4 Studentized maximum modulus statistic ˆµ i µ i max i k ˆσ/, (.2.2) n i and provides exact ( α)% simultaneous confidence intervals for µ i : µ i ˆµ i ± m α,k,ν ˆσ/ n i for i,..., k, (.2.3) where m α,k,ν is the α quantile of the Studentized maximum modulus statistic (.2.2). 2. The Bonferroni inequality method. This method uses the Bonferroni inequality that for any events E,..., E p, p p P ( Em) c P (Em), c m m and provides a set of conservative ( α)% simultaneous confidence intervals for µ,..., µ k : µ i ˆµ i ± t α/2k,ν ˆσ/ n i for i,..., k, (.2.4) where t α/2k,ν is the α/2k quantile of the t distribution with ν degrees of freedom. 3. Scheffé s method. This method is based on the pivotal statistic k ( n iˆµ i n i µ i ) 2 /k ˆσ 2, (.2.5) which has an F distribution with k and ν degrees of freedom. It gives exact ( α)% simultaneous confidence intervals for all linear combinations of µ,..., µ k : k l i µ i k l iˆµ i ± k kf α,k,ν ˆσ( li 2 /n i ) /2, for all l (l,..., l k ) R k, (.2.6) where F α,k,ν is the α quantile of an F distribution with k and ν degrees of freedom.

13 5.3 Multiple Comparison Inference Multiple comparison inference is simultaneous inference on a comparison of the treatment means, and is a setting different from simultaneous statistical inference, which is inference on the treatment means themselves, as described in Section.2. The parameters of interest in multiple comparison methods are functions of contrasts of µ,..., µ k. According to the parameters of interest primarily, and to the strength of the inference secondly, multiple comparison methods can be classified into four types: MCC, MCB, MCA, and ACC. These four types of multiple comparison methods will be described in the following four subsections..3. Multiple Comparison With a Control Dunnett (955) pioneered the concept of MCC. He suggested that at times when a control is present, the primary interest of comparison may be the comparison of each new treatment with the control. Suppose treatment k is the control. Then the parameters of interest in MCC are µ i µ k, for i,..., k, (.3.) the difference between the mean of each new treatment and the mean of the control. There are two types of MCC: one-sided MCC and two-sided MCC. The proper choice of an MCC method depends on the type of inference desired in different situations. If it is desired to infer whether any new treatments is better or worse than the control, one-sided MCC is a better choice. If it is of interest to detect whether the effects of new treatments and of the control are practically equivalent or different, two-sided MCC is preferred. For one-sided MCC, Dunnett s method (955) gives the simultaneous confidence lower

14 bounds and upper bounds for the difference between each new treatment µ i and the control mean µ k, as shown in the following theorem. 6 Theorem.3.. P { µ i µ k > ˆµ i ˆµ k dˆσ 2/n for i,..., k } P { µ i µ k < ˆµ i ˆµ k + dˆσ 2/n for i,..., k } α, (.3.2) where d is the solution to the equation [Φ(z + 2ds)] k dφ(z)γ(s)ds α. (.3.3) Here φ is the standard normal distribution function and γ is the density of ˆσ/σ. In addition to Dunnett s method, there are other methods that have been developed for one-sided MCC inference, which include the stepdown method of Naik (975), Marcus, Peritz and Gabriel (976) and the stepup method of Dunnett and Tamhane (99). For two-sided MCC, Dunnett s method (955) provides the simultaneous confidence intervals for the difference between each new treatment µ i and the control mean µ k, as given in the following theorem. Theorem.3.2. P { µ i µ k ˆµ i ˆµ k ± d ˆσ 2/n for i,..., k } α, (.3.4) where d is the solution to the equation [Φ(z + 2 d s) Φ(z 2 d s)] k dφ(z)γ(s)ds α. (.3.5)

15 7.3.2 Multiple Comparison With the Best MCB was first proposed by Hsu (98, 982), and it was designed to compare each treatment with the best of the other treatments. Suppose a larger treatment effect implies a better treatment. Then the parameters of primary interest are µ i max j i µ j, for i,..., k. (.3.6) If µ i µ j <, j i then treatment i is not the best since there is another treatment better than it; on the other hand, if µ i µ j >, j i then treatment i is the best treatment since it is better than all other treatments. In case a smaller treatment effect implies a better treatment, the parameters of primary interest are µ i min j i and different conclusions follow depending on whether µ i min j i µ j, for i,..., k, (.3.7) µ j is positive or negative. MCB inference can be categorized into two types: constrained MCB and unconstrained MCB. The difference between these two types is for constrained MCB inference, the simultaneous confidence intervals on µ i min j i µ j are constrained to contain, while for unconstrained MCB inference, those intervals are not constrained to contain. For situations where the magnitude of the difference between the best and those identified to be not the best is not of concern, constrained MCB is preferred since it achieves sharper inference than unconstrained MCB inference. For constrained MCB, if a confidence

16 interval for µ i min j i if a confidence interval for µ i min j i µ j has a lower limit, then the ith treatment is identified as the best; µ j has an upper limit, then the ith treatment is not the best. The following theorem (Hsu, 984b) gives a set of ( α)% simultaneous confidence intervals for µ i min j i Theorem.3.3. Let µ j for constrained MCB inference. 8 D i (ˆµ i min j i ˆµ j dˆσ 2/n), D + i (ˆµ i min j i ˆµ j + dˆσ 2/n) +. Then for all µ (µ,..., µ k ) and σ 2, P µ,σ 2{ µ i min j i µ j [D i, D+ i ] for i,..., k } α. (.3.8) Here d is the solution to Equation (.3.3) as given in Section.3., x min{, x } and x + max{, x }. In cases where one desires lower bounds on how much treatments identified not to be the best are worse than the true best, unconstrained MCB is a proper choice for making desired inference. The following theorem provides a set of confidence intervals that achieve confidence level of α. Theorem.3.4. For all µ (µ,..., µ k ) and σ 2, P µ,σ 2{ µ i min j i µ j ˆµ i min j i ˆµ j ± q ˆσ 2/n for i,..., k } α, (.3.9) with equality when µ... µ k. Here q is the solution to the equation { Zi Z j P ˆσ } q for all i > j α, (.3.) where Z,..., Z k are iid standard normal random variables.

17 9.3.3 All-pairwise Comparisons For MCA, the parameters of primary interest are µ i µ j, for all i j. (.3.) There are several methods available to provide simultaneous confidence intervals for allpairwise differences, µ i µ j, i j. Some of these methods are listed here.. Tukey s (953) Method. Tukey s method provides the following ( α)% simultaneous confidence intervals for all-pairwise differences: µ i µ j ˆµ i ˆµ j ± q ˆσ 2/n for all i j, (.3.2) where q is the solution to the equation { (ˆµ i µ i ) (ˆµ j µ j ) P ˆσ 2/n q for all i > j } α. (.3.3) 2. Bonfinger s (985) Confident Directions Method. This method provides the following constrained ( α)% simultaneous confidence intervals for all-pairwise differences: µ i µ j [ (ˆµ i ˆµ j q ˆσ 2/n), (ˆµ i ˆµ j + q ˆσ 2/n) + ] for all i j, (.3.4) where q is the solution to the equation { (ˆµ i µ i ) (ˆµ j µ j ) P ˆσ 2/n q for all i > j } α, (.3.5) x min{, x } and x + max{, x }. In situations where the equalities among the µ i s are impossible, Bonfinger s confident direction method gives sharper inference than deduction from Tukey s simultaneous confidence intervals.

18 3. Hayter s (99) One-sided Comparisons. Hayter derived the following ( α)% simultaneous lower confidence bounds on µ i µ j for all i > j: µ i µ j > ˆµ i ˆµ j q ˆσ 2/n for all i > j, (.3.6) where q is the same critical value as in the simultaneous confidence intervals (.3.4) of Bonfiger (985), i.e., q is the solution to Equation (.3.5). These simultaneous confidence intervals provide sharper inference in situations where it is suspected that µ µ 2... µ k and one might be primarily interested in lower confidence bounds on µ i µ j for all i > j..3.4 All-contrast Comparisons For ACC, the parameters of primary interest are k c i µ i, with c + c c k. (.3.7) Scheffé derived the following ( α)% simultaneous confidence intervals for all-contrast comparisons: k c i µ i k c iˆµ i ± (k )F α,k,ν ˆσ( k c 2 i /n i ) /2. (.3.8) In fact, all other three types of multiple comparisons, MCC, MCB and MCA are special cases of ACC, and hence can be deduced from ACC. The reason that we consider specific types of multiple comparisons is because direct inference is sharper than deduced inference. For example, inference on all-pairwise differences deduced from Scheffé s method, which is designed for inference on all linear combinations of the means, is weaker than inference given by Tukey s method, which is designed specifically for inferences on all-pairwise differences of the means. In applications, the proper choice of a multiple comparison method depends

19 primarily on the type of inference desired, and secondarily on the strength of inference intended to achieve..4 An Introduction to Multiple Comparisons in General Linear Model Multiple comparison methods described in Section.3 are based on the assumption that independent simple random samples are taken under the treatments and the response measured depends on the treatment only. However, in many real-life situations, the response of an experiment are affected by the treatment, as well as some covariates. In such cases, comparing the treatments may not be meaningful without adjusting for the effects of covariates. In experiments where covariates are present, the appropriate statistical model is the general linear model (GLM) Y Xβ + ɛ, (.4.) where Y N is the vector of responses, X N p is the design matrix, β p is the vector of parameters, and ɛ N is a vector of iid normally distributed errors with mean and unknown variance σ 2. The one-way model, as discussed in Section.2 is a GLM, and the analysis of covariance (ANCOVA) model, which will be discussed later and takes the form Y ij τ i + β i X ij + ɛ ij, is also a GLM. In a GLM, it is not reasonable to compare the treatment effects alone without considering the covariate effects, because the average response under the ith treatment may depend on the value of a covariate. For example, in the ANCOVA model, if β i β j, then whether or not the average response under the ith treatment is larger or smaller than the average response under the jth treatment depends on the value of the covariate, X.

20 2 Consequently, the parameters of interest in the ANCOVA model are (τ i + β i X) (τ j + β j X), (.4.2) a linear function of the covariate X. The desired inference in such cases would be multiple comparison inference in terms of a parametric function of treatment means rather than in terms of treatment means, and as a result, a set of simultaneous confidence bounds is more appropriate to make the desired inference than a set of simultaneous confidence intervals. In this research, we will focus attention on the ANCOVA model, and construct a set of simultaneous confidence bounds for comparisons of several regression lines. There are also situations where comparisons are to be made of parameters that do not correspond to long-run average treatment effects. For example, in drug stability studies, the parameters β,..., β k in the ANCOVA model correspond to the degradation rates of batches of drug products, and the parameter of concern is the comparisons of these rates, that is, β i β j. In Chapter 4, we will discuss this situation in detail..5 Stepwise Hypothesis Tests In the area of testing multiple hypotheses, the Bonferroni inequality is often used to set an upper bound for the familywise error rate. If T,..., T n is a set of test statistics with corresponding p-values P,..., P n for testing hypotheses H,..., H n, the classical Bonferroni multiple test procedure rejects H {H,..., H n } if any p-value is less than α/n. Furthermore, for each P i α/n, i,..., n, the specific hypothesis H i is rejected. The Bonferroni inequality, { n } P (P i α/n) α, ( α ), (.5.)

21 3 ensures that the probability of rejecting at least one hypothesis when all are true is no greater than α. The Bonferroni procedure is simple to apply and requires no distributional assumptions. A disadvantage of this procedure is that it is conservative, especially when the test statistics are highly correlated. Holm (979) improved Bonferroni s procedure and presented a sequentially rejective Bonferroni procedure. This procedure is a step-down multiple test procedure that is much less conservative but still maintains the FWE at α. Let P (),..., P (n) be the ordered p-values and H (),..., H (n) be the corresponding hypotheses. Let (p) α/(n p + ), p n. (.5.2) Then (p), p n, is a strictly increasing sequence of constants. Holm s step-down procedure begins by testing if P () (). If so, one rejects H () and continues to check whether P (2) (2). If not, all hypotheses are accepted. In general, Holm s procedure rejects H (i) when, for all j,..., i, P (j ) (j). (.5.3) Holm (979) proved that with the cut-off constants defined in (.5.2), the step-down procedure controls in general the FWE in the strong sense. Instead of testing sequentially starting with the smallest p-value, Hochberg (988) suggested to start with the largest p-value. The procedure proposed by Hochberg (988) is a step-up multiple test procedure. It begins by testing if P (n) (n). If so, one rejects all hypotheses. If not, one accepts H (n) and goes on to check whether P (n ) (n ). In general, H (i) is rejected if P (j) (j), for any j i. (.5.4)

22 4 Hochberg s procedure is more powerful than Holm s procedure and still keeps the FWE at α. In addition to Holm s procedure and Hochberg s procedure, there are many alternative procedures in the area of stepwise hypothesis testing. These include Simes procedure (986), Hommel s procedure (988), Wright s procedure (992), among others..6 Introduction to Drug Stability Studies The Food and Drug Administration (FDA) requires that for every drug product in market, its shelf-life (or expiration date) must be indicated on the container label. The expiration date of a drug product is defined as the time interval that the average drug characteristic (e.g., strength and purity) of the drug is expected to remain within approved specifications after manufacture. It is important to provide consumers with certain assurances that the drug product will retain its identity, strength, potency, dissolution, and purity during the claimed shelf-life period. The manufacturers (drug companies) usually conduct a stability study to ensure that a drug product can meet the approved specifications prior to its expiration date being printed on the package. Drug stability studies are normally designed to characterize the degradation of the drug product over time and to estimate the shelf-life based on the degradation curves. Generally, drug stability studies consist of a random sample of dosage units (e.g., tablets, capsules, vials) from a given batch or several batches placed in a storage room with controlled temperature and humidity conditions. The class of stability studies includes accelerated studies and long-term studies. Accelerated studies are usually conducted under a high level of special temperature and relative humidity conditions to increase the level of stress. The Arrhenius model is one of the most commonly used statistical models for estimation of drug stability parameters in accelerated

23 5 drug stability studies. Applications of this model in drug stability studies will be detailed in Section 5.. For long-term studies, the drug product is stored under room temperature and humidity conditions, and stability testing is performed under regular environmental conditions. Often times, multiple batches of a drug product are manufactured, and there may be more or less variation among batches. Consequently, the estimated shelf-life may vary from batch to batch. In practice, it is desired to combine data from different batches to estimate a single shelf-life of the drug. Several approaches for pooling data have been studied, including the FDA Guideline (987), multiple comparisons with the worst (Ruberg and Hsu, 992), MCA of regression lines (Liu et al.,24), and ACC of regression lines proposed in this research. The FDA Guideline (987) suggested to test the equality of the slopes of the regression line fitted for each batch, assuming the drug characteristic is expected to decrease linearly as time increases. Two batches are claimed as practically equivalent, and hence can be pooled, if they have similar slopes. In the case of shelf-life estimation, the most negative degradation rate is of interest, and this has led Ruberg and Hsu (992) to propose the approach of multiple comparisons with the worst (MCW). This pooling method compares all batches with the (unknown) worst batch and provides simultaneous confidence interval estimates of slope differences. These simultaneous confidence intervals are then used to make decisions on pooling as many batches as possible with the worst batch. Liu et al. (24) proposed the approach of MCA of the regression lines. They developed a set of simultaneous confidence bands and suggested a decision rule to pool batches. The proposed set of simultaneous confidence bands is claimed to be an improvement over a set of simultaneous confidence intervals since it allows one to assess the difference between two regression lines over a given range of time, rather than at one particular time point. The pooling method that we have

24 6 proposed in this research provides a set of simultaneous confidence bands based on ACC of regression lines. This method is more efficient in applications to drug stability studies than Liu et al. s method since the number of comparisons required to make conclusions can be greatly decreased if we choose appropriate contrasts. In Chapters 4 and 5, we will discuss these pooling methods in detail.

25 7 CHAPTER 2 SPURRIER S EXACT CONFIDENCE BOUNDS 2. Introduction Most research in the multiple comparison literature has been devoted to comparing the means of k( 3) groups under the assumption of iid normal errors. However, it is important in some circumstances to compare k( 3) groups based on some parametric function other than the mean. The pioneer work of Spurrier (999) provides a set of simultaneous bounds for exact inference on all contrasts of several simple linear regression lines over the entire range (-, ) using the same n design points. In this chapter we discuss how to develop exact simultaneous confidence bounds for all contrasts of three or more regression lines when the explanatory variable takes values on the whole real line. We begin this chapter by describing (in Section 2.2) the setting of the problem. In this section, we will introduce a pivotal quantity which is essential for developing the exact confidence bounds in the later sections. Section 2.3 shows how to find a closedform expression for the pivotal quantity. In Section 2.4, we show how to derive the exact distribution of the pivotal quantity. Section 2.5 compares the exact bounds developed by Spurrier (999) with the ones developed by Scheffé. An example to illustrate the exact confidence bounds is given in Section Setting of the Problem The simple linear regression model for the n observations from the ith group is Y ij α i + β i x j + ɛ ij, (2.2.)

26 8 for i,..., k and j,..., n. All error terms are assumed to be iid N (, σ 2 ). Without loss of generality, assume that the predictor variable values have been centered and scaled such that x and x x, where the n-dimensional vectors x and are defined as x (x,..., x n ) and (,..., ). These assumptions are well-known in multiple comparison literature, and it is certainly useful to extend to any functions theoretically. However, none of such non-linear assumptions has been investigated in the literature, and we will explore this direction in the future. Let ˆα i and ˆβ i denote the least squares estimators of α i and β i, i,..., k, respectively. Let ˆσ 2 denote the pooled error mean square with d.f., ν k(n 2), and let C denote the set of vectors c (c,..., c k ) such that k c i. Define Z i n /2 (ˆα i α i )/σ and Z 2i ( ˆβ i β i )/σ, for i,..., k. (2.2.2) Then,under the assumptions that x and x x, Z i and Z 2i are iid standard normal for all i,..., k. Let Z and Z 2 denote the sample means of the Z i s and the Z 2i s, respectively. A (-α)% simultaneous confidence bound for all contrasts of several regression lines can be obtained based on the traditional form of the point estimate plus or minus a probability point times the estimated standard error: [ ] /2 k k c i (α i + β i x) c i (ˆα i + ˆβ k i x) ± bˆσ (/n + x 2 ) c 2 i (2.2.3)

27 9 for all c C and all x (, ), where b is the probability point that depends on k, ν, and α. To determine the probability point b in Equation (2.2.3), Spurrier (999) defined a random variable Tc,x, 2 which is given by T c,x k c i[(ˆα i α i ) + ( ˆβ i β i )x] ˆσ[(/n + x 2 ), (2.2.4) k (c2 i )]/2 and he stated that by the union-intersection principle, the probability point b is the solution to the equation P ( T c,x b, real x and c C) α, (2.2.5) or, equivalently, the positive solution to the equation P [ ] sup (Tc,x) 2 b 2 α. (2.2.6) c C, x (, ) Therefore, an appropriate pivotal quantity sup(t 2 c,x) is determined by the unionintersection method. In the following two sections, a closed-form expression for this pivotal quantity will be found and then its exact distribution will be derived. 2.3 A Closed-form Expression for the Pivotal Quantity The pivotal quantity introduced in Section 2.2 will help us to compute the constant b in Equation (2.2.6) for the exact simultaneous confidence bounds in Equation (2.2.3). To compute b, an exact distribution of the pivotal quantity is required. It will be helpful to find a closed-form expression for the quantity before deriving the exact distribution. The following theorem gives a closed-form expression for sup(t 2 c,x).

28 Theorem The pivotal quantity sup(t 2 c,x) Q/(ˆσ 2 /σ 2 ), where Q {Q + Q 22 + [4R 2 Q Q 22 + (Q 22 Q ) 2 ] /2 }/2, with Q jj k (Z ji Z j )(Z j i Z j ), j and j, 2, and R Q 2 /[Q Q 22 ] /2. 2 Proof of Theorem With the definition of the random variable T c,x, write T 2 c,x { k c i[(ˆα i α i ) + ( ˆβ i β i )x]} 2 ˆσ 2 [(/n + x 2 ) k (c2 i )] { k c i[(ˆα i α i )/σ + ( ˆβ i β i )x/σ]} 2 (ˆσ/σ) 2 [(/n + x 2 ) k (c2 i )] { k c i[z i / n + Z 2i x]} 2 (ˆσ/σ) 2 [(/n + x 2 ) by Equation (2.2.2) k (c2 i )], { k c i[(z i Z )/ n + (Z 2i Z 2 )x]} 2 (ˆσ/σ) 2 [(/n + x 2 ) k (c2 i )], since k c i. Holding x fixed, then T 2 c,x is maximized when c i (Z i Z )/ n + (Z 2i Z 2 )x by the Cauchy-Schwarz inequality. The proof is as follows: The Cauchy-Schwarz inequality states that: For any two random variables X and Y, EXY (E X 2 ) /2 (E Y 2 ) /2, which can be re-written as (EXY ) 2 EX 2 EY 2. (2.3.) This result also applies to numerical sums when there is no explicit reference to an expectation, and hence Equation (2.3.) can also be written in the form: ( k c ia i ) 2 k c2 i k a 2 i. (2.3.2) Notice that when x is fixed, the maximization T 2 c,x is equivalent to the maximization of the quantity T { k c i[(z i Z )/ n+(z 2i Z 2 )x]} 2 / k (c2 i ). Let a i (Z i Z )/ n+ (Z 2i Z 2 )x. Then, based on Equation (2.3.2), the quantity T attains its maximum k a2 i when c i a i (Z i Z )/ n + (Z 2i Z 2 )x.

29 2 T 2 x Now denote this maximum value of T 2 c,x for fixed x by T 2 x, which is given by k [(Z i Z )/ n + (Z 2i Z 2 )x] 2 /n + x 2 n + nx 2 + nx 2 (ˆσ/σ) 2 [(/n + x 2 )] k [(Z i Z ) 2 /n + 2x(Z i Z )(Z 2i Z 2 )/ n + (Z 2i Z 2 ) 2 x 2 ] (ˆσ/σ) 2 k (Z i Z ) 2 /n + 2x/ n k (Z i Z )(Z 2i Z 2 ) + x 2 k (Z 2i Z 2 ) 2 ] (ˆσ/σ) 2 k (Z i Z ) 2 + 2x n k (Z i Z )(Z 2i Z 2 ) + nx 2 k (Z 2i Z 2 ) 2 ] (ˆσ/σ) 2. Let a /( + nx 2 ). Then a [, ], a (nx 2 )/( + nx 2 ), and [a( a)] /2 ( nx 2 )/( + nx 2 ). Therefore, with the quantities Q jj, j and j, 2 defined earlier, we have T 2 x aq + 2[a( a)] /2 Q 2 + ( a)q 22 (ˆσ/σ) 2. (2.3.3) Now, we need to maximize T 2 x with respect to x, or, equivalently, with respect to a [, ]. This can be done by maximizing the numerator of T 2 x, aq +2[a( a)] /2 Q 2 + ( a)q 22, denoted by T, with respect to a [, ]. Taking the derivative of T with respect to a and setting it equal to, the following equation is obtained: Q + Rearranging of Equation (2.3.4) gives the equation 2a [a( a)] /2 Q 2 Q 22. (2.3.4) Q 22 Q Q 2 2a, (2.3.5) [a( a)] /2 squaring both sides of Equation (2.3.5) yields Now define [ ] 2 Q22 Q 2 Q 2 ( 2a)2 4a( a). (2.3.6) V Q 22 Q, provided Q 2. (2.3.7) 2 Q 2

30 22 Substitution of Equation (2.3.7) into Equation (2.3.6) yields V 2 ( 2a)2 4a( a). (2.3.8) Standard arguments show that there are two possible solutions to Equation (2.3.8), given by a ± [V 2 /( + V 2 )] /2. (2.3.9) 2 Notice that < { [V 2 /( + V 2 )] /2 }/2 < /2 and /2 < { + [V 2 /( + V 2 )] /2 }/2 <. To decide which solution is the valid solution to Equation (2.3.4), the following two cases are considered:. If Q 22 > Q, then V > and from Equation (2.3.5), 2a >, that is, a < /2. In this case, the solution { [V 2 /( + V 2 )] /2 }/2 should be chosen and hence the solution to Equation (2.3.4) is a [ V/( + V 2 ) /2 ]/2 since V >. 2. If Q 22 < Q, then V < and from Equation (2.3.5), 2a <, that is, a > /2. In this case, the solution { + [V 2 /( + V 2 )] /2 }/2 is chosen and hence the solution to Equation (2.3.4) is also a [ V/( + V 2 ) /2 ]/2 since V <. With the arguments above, and after checking the second derivative with respect to a, it is concluded that T (or Tx 2 ) achieves its maximum at a [ V/( + V 2 ) /2 ]/2, denoted by A. Now, write sup(tc,x) 2 aq + 2[a( a)] /2 Q 2 + ( a)q 22 sup (2.3.) a [,] (ˆσ/σ) 2 {Q + Q 22 + ( 2a)(Q 22 Q ) + 4 Q 2 [a( a)] /2 }/2 sup a [,] (ˆσ/σ) 2 {Q + Q 22 + V/( + V 2 ) /2 (Q 22 Q ) + 2 Q 2 /( + V 2 ) /2 }/2 (ˆσ/σ) 2,

31 since when a [ V/( + V 2 ) /2 ]/2, 2a V/( + V 2 ) /2, and a( a) /[4( + V 2 )], 23 {Q + Q 22 + [(Q 22 Q ) 2 + 4Q 2 2]/[2 Q 2 ( + V 2 ) /2 ]}/2 (ˆσ/σ) 2 {Q + Q 22 + [(Q 22 Q ) 2 + 4Q 2 2] /2 }/2 (ˆσ/σ) 2, this is because ( + V 2 ) /2 [(Q 22 Q ) 2 + 4Q 2 2] /2 /(2 Q 2 ) by the definition of V {Q + Q 22 + [(Q 22 Q ) 2 + 4R 2 Q Q 22 ] /2 }/2 (ˆσ/σ) 2 Q ˆσ 2 /σ 2, (2.3.) where Q {Q + Q 22 + [4R 2 Q Q 22 + (Q 22 Q ) 2 ] /2 }/2 and R Q 2 /[Q Q 22 ] /2. Notice that with the definition of the random variable V, Q 2 is assumed to be not equal to. However, even when Q 2, this result still holds. This is because if Q 2, Q [Q + Q 22 + Q 22 Q ]/2 max(q, Q 22 ); and if Q 2, the numerator of the right side of Equation (2.3.) is equal to max [(Q Q 22 )a + Q 22 ], or, equivalently, equal a [,] to max(q, Q 22 ), which is the same as Q. The proof is complete. 2.4 The Exact Distribution of the Pivotal Quantity In Section 2.3, a closed-form expression for the pivotal quantity sup(t 2 c,x) was found. The next step is to derive the exact distribution of this quantity. Before the derivation of the exact distribution, some remarks are collected that will be needed throughout the derivation. First, as mentioned in Section 2.2, (Z,..., Z k ) and (Z 2,..., Z 2k ) are independent sets of k iid standard normal variables under the design constraints. The variable Q jj defined earlier is the numerator of the sample variance of the jth set, j, 2. The variable

32 24 R is the sample correlation coefficient computed on (Z i, Z 2i ), i,..., k. Moreover, Z ji s depend on the original regression data only through the ˆα i s and ˆβ i s. Second, the variables Q, Q 22, ν(ˆσ 2 /σ 2 ), and R are mutually independent (Anderson (958) and Graybill (976)). Furthermore, the first three variables have χ 2 distributions with k, k, and ν degrees of freedom, and R has the density f R (r) Γ[(k )/2]( r2 ) (k 4)/2 Γ[(k 2)/2](π) /2 for r. (2.4.) The following theorem gives the exact distribution of the quantity sup(t 2 c,x). Here is some notation that is used in the theorem and in the proof. Let F ν, ν 2 denote the distribution function of the F distribution with ν and ν 2 degrees of freedom and let p i i j (2j ) for positive integer i. Theorem For odd k 3, P [sup(t 2 c,x) b 2 ] k 2 F k,ν[2b 2 /(k )] (k )/2 π /2 b k 2 Γ[(ν + k 2)/2] Γ[(k )/2]Γ(ν/2)ν (k 2)/2 [ + (b 2 /ν)] [(ν+k 2)/2] F,ν+k 2 [b 2 (ν + k 2)/(b 2 + ν)] + Γ[(k )/2]2 (k )/2 (k 5)/2 (k 3 2i)Γ[(k + 2i)/2] p i+ where the summation is defined to be for k 3 or 5. F k +2i,ν [2b 2 /(k + 2i)],

33 25 For even k 4, P [sup(t 2 c,x) b 2 ] p (k 2)/2 Γ(k/2)2 (k 2)/2 [F k,ν (b 2 /k) F k 2,ν (b 2 /(k 2))] + p (k 2)/2 (k 2)/2 iγ(k 2 i) 2 (k 2)/2 i Γ(k/2 i) F 2(k 2 i),ν(b 2 /(k i 2)). The following lemma will be used in the proof of Theorem 2.4., and the proof of this lemma will be given after the proof of Theorem 2.4. is completed. Lemma Let a be an odd integer. Then z q a/2 exp( q/2)erf [(q/2) /2 ] dq Γ[(a + 2)/2]2 (a+2)/2 (a+)/2 [G (z)] 2 /2 + (/π) [Γ(i)G 2i (2z)/p i ] (a+)/2 (2/π) /2 G (z) [z i (/2) exp( z/2)/p i ], where the summations are defined to be if a. Proof of Theorem First we derive the density of Q, f(q). Define W min(q, Q 22 ), W 2 max(q, Q 22 ), and W 3 R 2. It follows from Equation (2.4.) that W 3 has the density f W3 (w 3 ) Γ[(k )/2]( w 3) (k 4)/2 Γ[(k 2)/2][πw 3 ] /2, w 3. Also, W and W 2 have the joint density f W,W 2 (w, w 2 ) 2f W (w )f W2 (w 2 ) [ ] 2 2 exp( (w Γ[(k )/2]2 (k )/2 + w 2 )/2)(w w 2 ) (k 3)/2.

34 26 This is because of the results of order statistics and of the fact that Q and Q 22 are independent and identically distributed random variables, each following a χ 2 distribution with k degrees of freedom. As remarked earlier, Q, Q 22, and R are mutually independent. Then Q, Q 22, and W 3 are also mutually independent, and hence W 3 is independent of W and W 2. Therefore, the following joint density of (W,W 2,W 3 ) is obtained: f(w, w 2, w 3 ) 2 exp( (w + w 2 )/2)(w w 2 ) (k 3)/2 Γ[(k )/2]( w 3 ) (k 4)/2 [Γ((k )/2)2 (k )/2 ] 2 Γ[(k 2)/2][πw 3 ] /2, 2 exp( (w + w 2 )/2)(w w 2 ) (k 3)/2 ( w 3 ) (k 4)/2 Γ[(k )/2]Γ[(k 2)/2]2 (k 2) [πw 3 ] /2, (2.4.2) for w w 2 <, w 3. Now the joint density of (W,W 2,Q) can be found. Note that W 3 (Q W )(Q W 2 ) W W 2. This is because Q {Q + Q 22 + [4R 2 Q Q 22 + (Q 22 Q ) 2 ] /2 }/2, and this can be rearranged into the equation (2Q Q Q 22 ) 2 4R 2 Q Q 22 + (Q 22 Q ) 2. Then, R 2 (2Q Q Q 22 ) 2 (Q 22 Q ) 2 4Q Q 22 4Q2 4Q(Q + Q 22 ) + 4Q Q 22 4Q Q 22 (Q Q )(Q Q 22 ) Q Q 22 (Q W )(Q W 2 ) W W 2.

35 27 Consider the following change of variables, W W, W 2 W 2, The Jacobian can be computed to be w J w 2 w 3 W 3 (Q W )(Q W 2 ) W W 2. w w 2 q w w w w 2 q w 2 w 2 w w 2 q w 3 w 3 It follows that the joint density of (W,W 2,Q) is 2q w w 2 w w 2. f(w, w 2, q) 2q w w 2 w w 2 exp( (w + w 2 )/2)(w w 2 ) (k 3)/2 ( [(q w )(q w 2 )/(w w 2 )]) (k 4)/2 2 (k 2) Γ[(k )/2]Γ[(k 2)/2]{π[(q w )(q w 2 )/(w w 2 )]} /2, (2q w w 2 ) exp( (w + w 2 )/2)[q(w + w 2 q)] (k 4)/2 2 (k 2) Γ[(k )/2]Γ[(k 2)/2][π(q w )(q w 2 )] /2, (2.4.3) for w w 2 < q w + w 2 <. Now consider the further change of variables, This leads to the inverse functions X 2Q W W 2, Q X 2 Q W 2Q W W 2. W Q( X X 2 ), W 2 Q[ X ( X 2 )], Q Q,

36 28 and hence the Jacobian can be computed to be w w w x x 2 q J w 2 x 2 w 2 x w 2 q q q q x x 2 q qx 2 qx x x 2 q( x 2 ) qx x ( x 2 ) q 2 x x 2 q 2 x ( x 2 ) q 2 x. To obtain the joint density of (X, X 2, Q), notice the following equalities: w + w 2 2q qx q w qx x 2 q w 2 qx ( x 2 ) And also, note that x 2 w +w 2 q, and w w 2 < q w +w 2, so w +w 2 q < 2, and hence < x. x 2 q w 2q w w 2 q w 2q w w 2, since w w 2, and x 2 q w 2q w w 2 < q w 2q w q, since q > w 2, so 2 x 2 <. < q <.

37 29 Then the joint density of (X, X 2, Q) is f(x, x 2, q) q2 x exp( (2q qx )/2)[q(q qx )] (k 4)/2 (2q 2q + qx ) 2 (k 2) Γ[(k )/2]Γ[(k 2)/2][πqx x 2 qx ( x 2 )] /2, for < x, /2 x 2 <, < q <. q k 2 exp( q + qx /2)x ( x ) (k 4)/2, (2.4.4) 2 (k 2) Γ[(k )/2]Γ[(k 2)/2][πx 2 ( x 2 )] /2 At this point, the density of Q can be found by integrating the joint density of (X, X 2, Q) with respect to x and x 2. This yields f(q) this is because 2 q k 2 exp( q + qx /2)x ( x ) (k 4)/2 2 (k 2) Γ[(k )/2]Γ[(k 2)/2][πx 2 ( x 2 )] /2 dx 2 dx q k 2 exp( q + qx /2)x ( x ) (k 4)/2 2 (k 2) Γ[(k )/2]Γ[(k 2)/2](π) /2 π /2 q k 2 exp( q) 2 k Γ[(k )/2]Γ[(k 2)/2] 2 [x 2 ( x 2 )] /2 dx 2 dx x ( x ) (k 4)/2 exp(qx /2) dx, (2.4.5) 2 [x 2 ( x 2 )] dx / [ 2 2 2y y( y 2 ) /2 dy, let ( x 2) /2 y dy ( y 2 ) /2 ] arcsin(y) / 2 2 ( π 4 ) π 2. In order to derive the density of Q, the following two cases will be considered: even k 4 and odd k 3. Case : k 4 and k is even. Consider the integration x ( x ) (k 4)/2 exp(qx /2) dx. Let x y, and for

38 3 convenience of notation also let (k 4)/2 n. Then, x ( x ) (k 4)/2 exp(qx /2) dx ( y)y n exp(q( y))/2 d( y) (y n y n+ ) exp(q( y)/2) dy [ exp(q/2) y n exp( qy/2) dy ] y n+ exp( qy/2) dy. (2.4.6) Note that y n+ exp( qy/2) dy (2/q)y n+ d exp( qy/2) (2/q) [ y n+ exp( qy/2) ] + (2/q) (n + ) exp( qy/2)y n dy (2/q) exp( q/2) + 2(n + ) q y n exp( qy/2) dy (2.4.7) Substituting Equation (2.4.7) into Equation (2.4.6) yields x ( x ) (k 4)/2 exp(qx /2) dx (2/q) + [ 2(n + )/q] exp(q/2) y n exp( qy/2) dy (2.4.8) For the term yn exp( qy/2) dy it follows after repeated integration by parts and collecting terms that n+ y n exp( qy/2) dy 2 i n! exp( q/2) (n i + )!q i + 2 n+ n!. (2.4.9) (n + (k 2)/2)!qn+

39 3 Substituting Equation (2.4.9) into Equation (2.4.8) yields 2 q x ( x ) (k 4)/2 exp(qx /2) dx + 2(n + ) q q 2 q + k 2 q q n+ (k 2)/2 2 i n! 2(n + ) q (n i + )!q i q The last step uses the fact that (k 4)/2 n. exp(q/2)2 n+ n! (n + (k 2)/2)!q n+ 2 i [(k 4)/2]! + (q k + 2) exp(q/2)2(k 2)/2 [(k 4)/2]!. [(k 2)/2 i]! q i q k/2 Combining Equations (2.4.5) and (2.4.), the density of Q is obtained as (2.4.) f(q) π /2 q k 2 exp( q) 2 k Γ[(k )/2]Γ[(k 2)/2] 2 q + k 2 q (k 2)/2 2 i [(k 4)/2]! + (q k + 2) exp(q/2)2(k 2)/2 [(k 4)/2]!. q [(k 2)/2 i]! q i q k/2 To proceed, notice that when k is even, the following equalities hold: (2.4.) Γ[(k 2)/2] [(k 4)/2]! (2.4.2a) Γ[(k )/2] π/2 p (k 2)/2 2 (k 2)/2 (2.4.2b) Substituting Equations (2.4.2a) and (2.4.2b) into Equation (2.4.) yields the following equation π /2 q k 2 exp( q) (q k + 2) exp(q/2)2 (k 2)/2 [(k 4)/2]! 2 k Γ[(k )/2]Γ[(k 2)/2] q k/2 exp( q/2)q(k 4)/2 (q k + 2) 2p (k 2)/2. (2.4.3)

40 32 Now consider Equation (2.4.) again. Write the first two terms in the bracket as (2/q) + k 2 q q (k 2)/2 (k 2)/2 (k 2)/2 (k 2)/2 k 2 q (k 2)/2 2 i Γ[(k 2)/2] Γ(k/2 i)q i (k 2)2 i Γ[(k 2)/2] Γ(k/2 i)q i+ (k 2)2 i Γ[(k 2)/2] Γ(k/2 i)q i+ 2 i [(k 4)/2]! [(k 2)/2 i]! q i + (2/q) (k 2)/2 i2 (k 2)/2 (k 2)/2 2 i Γ[(k 2)/2] Γ(k/2 i)q i 2 i Γ[(k 2)/2] Γ(k/2 i)q i 2 i+ Γ[(k 2)/2] Γ(k/2 i )q i+, i2 i+ Γ[(k 2)/2] Γ(k/2 i)q i+ + (k 2)2(k 2)/2 Γ[(k 2)/2] Γ[k/2 (k 2)/2]q (k 2)/2+ (change of dummy variable) by the fact that Γ(k/2 i) (k/2 i )Γ(k/2 i ), (k 2)/2 i2 i+ Γ[(k 2)/2] Γ(k/2 i)q i+ + (k 2)2(k 2)/2 Γ[(k 2)/2] q k/2. Then, π /2 q k 2 exp( q) 2 k Γ[(k )/2]Γ[(k 2)/2] (k 2)/2 (k 2)/2 (k 2)/2 i2 i+ Γ[(k 2)/2] + (k 2)2(k 2)/2 Γ[(k 2)/2] Γ(k/2 i)q i+ q k/2 i exp( q)q k 3 i 2 (k 2)/2 i Γ(k/2 i)p (k 2)/2 + qk/2 2 exp( q)(k 2)/2 p (k 2)/2, by Equation (2.4.2b) i exp( q)q k 3 i 2 (k 2)/2 i Γ(k/2 i)p (k 2)/2. (2.4.4) Therefore, the density of Q is obtained after combining Equations (2.4.3) and (2.4.4): f(q) exp( q/2)q(k 4)/2 (q k + 2)/2 + < q <. (k 2)/2 i exp( q)q k 3 i / p 2 (k 2)/2 i Γ(k/2 i) (k 2)/2, (2.4.5)

41 33 It follows that the distribution function of Q is P (Q q) q exp( x/2)x(k 4)/2 (x k + 2)/2 + p (k 2)/2 + p (k 2)/2 q q [ exp( x/2)x (k 2)/2 (k 2)/2 2 (k 2)/2 i exp( x)x k 3 i 2 (k 2)/2 i Γ(k/2 i) dx [ Γ(k/2)2 (k 2)/2 (G k (q) G k 2 (q)) ] p (k 2)/2 + p (k 2)/2 (k 2)/2 exp( x/2)x(k 4)/2 (k 2) 2 i q x k 3 i exp( x) dx 2 (k 2)/2 i Γ(k/2 i) i exp( x)x k 3 i / p 2 (k 2)/2 i Γ(k/2 i) (k 2)/2 dx ] dx where G k denotes the distribution funtion of the chi-squared distribution with k df, and now let y 2x, [ Γ(k/2)2 (k 2)/2 (G k (q) G k 2 (q)) ] p (k 2)/2 + p (k 2)/2 (k 2)/2 i 2 (k 2)/2 i Γ(k/2 i) 2q [ Γ(k/2)2 (k 2)/2 (G k (q) G k 2 (q)) ] p (k 2)/2 + p (k 2)/2 (k 2)/2 (/2) k 2 i y k 3 i exp( y/2) dy i 2 (k 2)/2 i Γ(k/2 i) (/2)k 2 i Γ(k 2 i)2 k 2 i G 2(k 2 i) (2q) [ Γ(k/2)2 (k 2)/2 (G k (q) G k 2 (q)) ] p (k 2)/2 + p (k 2)/2 (k 2)/2 iγ(k 2 i) 2 (k 2)/2 i Γ(k/2 i) G 2(k 2 i)(2q). (2.4.6)

42 34 Let h(u) denote the density of U ˆσ 2 /σ 2. Then P [sup(t 2 c,x) b 2 ] P [Q b 2 U] Notice that P (Q b 2 u)h(u) du, by conditioning on u. (2.4.7) G k (b 2 u)h(u) du G k (b 2 u) dh(u), where H(u) is the distribution function of U, P (Y/U < b 2 ), where Y has a chi-squared distribution with k df, U has distribution function H(u), and Y and U are independent, P ( Y/k Uν/ν < b2 k ) F k,ν (b 2 /k), because Uν has a chi-squared distribution with ν df. Finally, after substituting Equation (2.4.6) into Equation (2.4.7) and using the result obtained above that G k (b 2 u)h(u) du F k,ν (b 2 /k), the following result follows in the case where k 4 and k is even: P [sup(t 2 c,x) b 2 ] p (k 2)/2 Γ(k/2)2 (k 2)/2 + p (k 2)/2 (k 2)/2 [(G k (b 2 u) G k 2 (b 2 u))]h(u) du iγ(k 2 i) 2 (k 2)/2 i Γ(k/2 i) G 2(k 2 i) (2b 2 u)h(u) du p (k 2)/2 Γ(k/2)2 (k 2)/2 [F k,ν (b 2 /k) F k 2,ν (b 2 /(k 2))] + p (k 2)/2 (k 2)/2 iγ(k 2 i) 2 (k 2)/2 i Γ(k/2 i) F 2(k 2 i),ν(b 2 /(k i 2)). (2.4.8)

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES Journal of Biopharmaceutical Statistics, 16: 1 14, 2006 Copyright Taylor & Francis, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500406421 AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY

More information

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5) STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons Ch. 4-5) Recall CRD means and effects models: Y ij = µ i + ϵ ij = µ + α i + ϵ ij i = 1,..., g ; j = 1,..., n ; ϵ ij s iid N0, σ 2 ) If we reject

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 21: 726 747, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.551333 MULTISTAGE AND MIXTURE PARALLEL

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE By Wenge Guo and M. Bhaskara Rao National Institute of Environmental Health Sciences and University of Cincinnati A classical approach for dealing

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Multiple Comparison Methods for Means

Multiple Comparison Methods for Means SIAM REVIEW Vol. 44, No. 2, pp. 259 278 c 2002 Society for Industrial and Applied Mathematics Multiple Comparison Methods for Means John A. Rafter Martha L. Abell James P. Braselton Abstract. Multiple

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Control of Generalized Error Rates in Multiple Testing

Control of Generalized Error Rates in Multiple Testing Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 245 Control of Generalized Error Rates in Multiple Testing Joseph P. Romano and

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY BY YINGQIU MA A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New

More information

Control of Directional Errors in Fixed Sequence Multiple Testing

Control of Directional Errors in Fixed Sequence Multiple Testing Control of Directional Errors in Fixed Sequence Multiple Testing Anjana Grandhi Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102-1982 Wenge Guo Department of Mathematical

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will

More information

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials Research Article Received 29 January 2010, Accepted 26 May 2010 Published online 18 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4008 Mixtures of multiple testing procedures

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

More about Single Factor Experiments

More about Single Factor Experiments More about Single Factor Experiments 1 2 3 0 / 23 1 2 3 1 / 23 Parameter estimation Effect Model (1): Y ij = µ + A i + ɛ ij, Ji A i = 0 Estimation: µ + A i = y i. ˆµ = y..  i = y i. y.. Effect Modell

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics

Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics 1 Multiple Endpoints: A Review and New Developments Ajit C. Tamhane (Joint work with Brent R. Logan) Department of IE/MS and Statistics Northwestern University Evanston, IL 60208 ajit@iems.northwestern.edu

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 7, Issue 1 2011 Article 12 Consonance and the Closure Method in Multiple Testing Joseph P. Romano, Stanford University Azeem Shaikh, University of Chicago

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

c 2011 Kuo-mei Chen ALL RIGHTS RESERVED

c 2011 Kuo-mei Chen ALL RIGHTS RESERVED c 2011 Kuo-mei Chen ALL RIGHTS RESERVED ADMISSIBILITY AND CONSISTENCY FOR MULTIPLE COMPARISON PROBLEMS WITH DEPENDENT VARIABLES BY KUO-MEI CHEN A dissertation submitted to the Graduate School New Brunswick

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Familywise Error Rate Controlling Procedures for Discrete Data

Familywise Error Rate Controlling Procedures for Discrete Data Familywise Error Rate Controlling Procedures for Discrete Data arxiv:1711.08147v1 [stat.me] 22 Nov 2017 Yalin Zhu Center for Mathematical Sciences, Merck & Co., Inc., West Point, PA, U.S.A. Wenge Guo Department

More information

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications Thomas Brechenmacher (Dainippon Sumitomo Pharma Co., Ltd.) Jane Xu (Sunovion Pharmaceuticals Inc.) Alex Dmitrienko

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1 Linear Combinations Comparison of treatment means Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 6 1 Linear Combinations of Means y ij = µ + τ i + ǫ ij = µ i + ǫ ij Often study

More information

Power and sample size determination for a stepwise test procedure for finding the maximum safe dose

Power and sample size determination for a stepwise test procedure for finding the maximum safe dose Journal of Statistical Planning and Inference 136 (006) 163 181 www.elsevier.com/locate/jspi Power and sample size determination for a stepwise test procedure for finding the maximum safe dose Ajit C.

More information

Regression With a Categorical Independent Variable: Mean Comparisons

Regression With a Categorical Independent Variable: Mean Comparisons Regression With a Categorical Independent Variable: Mean Lecture 16 March 29, 2005 Applied Regression Analysis Lecture #16-3/29/2005 Slide 1 of 43 Today s Lecture comparisons among means. Today s Lecture

More information

Hochberg Multiple Test Procedure Under Negative Dependence

Hochberg Multiple Test Procedure Under Negative Dependence Hochberg Multiple Test Procedure Under Negative Dependence Ajit C. Tamhane Northwestern University Joint work with Jiangtao Gou (Northwestern University) IMPACT Symposium, Cary (NC), November 20, 2014

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within

More information

Lec 1: An Introduction to ANOVA

Lec 1: An Introduction to ANOVA Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Unit 12: Analysis of Single Factor Experiments

Unit 12: Analysis of Single Factor Experiments Unit 12: Analysis of Single Factor Experiments Statistics 571: Statistical Methods Ramón V. León 7/16/2004 Unit 12 - Stat 571 - Ramón V. León 1 Introduction Chapter 8: How to compare two treatments. Chapter

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics Introduction A Brief Introduction to Intersection-Union Tests Often, the quality of a product is determined by several parameters. The product is determined to be acceptable if each of the parameters meets

More information

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina Multiple Testing Tim Hanson Department of Statistics University of South Carolina January, 2017 Modified from originals by Gary W. Oehlert Type I error A Type I error is to wrongly reject the null hypothesis

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons Explaining Psychological Statistics (2 nd Ed.) by Barry H. Cohen Chapter 13 Section D F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons In section B of this chapter,

More information

Chapter Seven: Multi-Sample Methods 1/52

Chapter Seven: Multi-Sample Methods 1/52 Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Lecture 6 April

Lecture 6 April Stats 300C: Theory of Statistics Spring 2017 Lecture 6 April 14 2017 Prof. Emmanuel Candes Scribe: S. Wager, E. Candes 1 Outline Agenda: From global testing to multiple testing 1. Testing the global null

More information

Chapter 12. Analysis of variance

Chapter 12. Analysis of variance Serik Sagitov, Chalmers and GU, January 9, 016 Chapter 1. Analysis of variance Chapter 11: I = samples independent samples paired samples Chapter 1: I 3 samples of equal size J one-way layout two-way layout

More information

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model Biostatistics 250 ANOVA Multiple Comparisons 1 ORIGIN 1 Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model When the omnibus F-Test for ANOVA rejects the null hypothesis that

More information

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Methods for Identifying Out-of-Trend Data in Analysis of Stability Measurements Part II: By-Time-Point and Multivariate Control Chart

Methods for Identifying Out-of-Trend Data in Analysis of Stability Measurements Part II: By-Time-Point and Multivariate Control Chart Peer-Reviewed Methods for Identifying Out-of-Trend Data in Analysis of Stability Measurements Part II: By-Time-Point and Multivariate Control Chart Máté Mihalovits and Sándor Kemény T his article is a

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests K. Strassburger 1, F. Bretz 2 1 Institute of Biometrics & Epidemiology German Diabetes Center,

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013 Topic 19 - Inference - Fall 2013 Outline Inference for Means Differences in cell means Contrasts Multiplicity Topic 19 2 The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Enquiry. Demonstration of Uniformity of Dosage Units using Large Sample Sizes. Proposal for a new general chapter in the European Pharmacopoeia

Enquiry. Demonstration of Uniformity of Dosage Units using Large Sample Sizes. Proposal for a new general chapter in the European Pharmacopoeia Enquiry Demonstration of Uniformity of Dosage Units using Large Sample Sizes Proposal for a new general chapter in the European Pharmacopoeia In order to take advantage of increased batch control offered

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

EFFECT OF THE UNCERTAINTY OF THE STABILITY DATA ON THE SHELF LIFE ESTIMATION OF PHARMACEUTICAL PRODUCTS

EFFECT OF THE UNCERTAINTY OF THE STABILITY DATA ON THE SHELF LIFE ESTIMATION OF PHARMACEUTICAL PRODUCTS PERIODICA POLYTECHNICA SER. CHEM. ENG. VOL. 48, NO. 1, PP. 41 52 (2004) EFFECT OF THE UNCERTAINTY OF THE STABILITY DATA ON THE SHELF LIFE ESTIMATION OF PHARMACEUTICAL PRODUCTS Kinga KOMKA and Sándor KEMÉNY

More information

Scheffé s Method. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 37

Scheffé s Method. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 37 Scheffé s Method opyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 37 Scheffé s Method: Suppose where Let y = Xβ + ε, ε N(0, σ 2 I). c 1β,..., c qβ be q estimable functions, where

More information

http://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information