Simultaneous critical values for t-tests in very high dimensions

Size: px
Start display at page:

Download "Simultaneous critical values for t-tests in very high dimensions"

Transcription

1 Bernoulli 17(1, 2011, DOI: /10-BEJ272 Siultaneous critical values for t-tests in very high diensions HONGYUAN CAO 1 and MICHAEL R. KOSOROK 2 1 Departent of Health Studies, 5841 South Maryland Avenue MC 2007, University of Chicago, Chicago, IL, 60637, USA. E-ail: hycao@uchicago.edu 2 Departent of Biostatistics and Departent of Statistics and Operations Research, 3101 Mcgavran- Greenberg Hall, CB 7420, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. E-ail: kosorok@unc.edu This article considers the proble of ultiple hypothesis testing using t-tests. The observed data are assued to be independently generated conditional on an underlying and unknown two-state hidden odel. We propose an asyptotically valid data-driven procedure to find critical values for rejection regions controlling the k-failywise error rate (k-fwer, false discovery rate (FDR and the tail probability of false discovery proportion (FDTP by using one-saple and two-saple t-statistics. We only require a finite fourth oent plus soe very general conditions on the ean and variance of the population by virtue of the oderate deviations properties of t-statistics. A new consistent estiator for the proportion of alternative hypotheses is developed. Siulation studies support our theoretical results and deonstrate that the power of a ultiple testing procedure can be substantially iproved by using critical values directly, as opposed to the conventional p-value approach. Our ethod is applied in an analysis of the icroarray data fro a leukeia cancer study that involves testing a large nuber of hypotheses siultaneously. Keywords: epirical processes; FDR; high diension; icroarrays; ultiple hypothesis testing; one-saple t-statistics; self-noralized oderate deviation; two-saple t-statistics 1. Introduction Aong the any challenges raised by the analysis of large data sets is the proble of ultiple testing. Exaples include functional agnetic resonance iaging, source detection in astronoy and icroarray analysis in genetics and olecular biology. It is now coon practice to siultaneously easure thousands of variables or features in a variety of biological studies. Many of these high-diensional biological studies are aied at identifying features showing a biological signal of interest, usually through the application of large-scale significance testing. The possible outcoes are suarized in Table 1. Traditional ethods that provide strong control of the failywise error rate (FWER = P(V 1 often have low power and can be unduly conservative in any applications. One way around this is to increase the nuber k of false rejections one is willing to tolerate. This results in a relaxed version of FWER, k-fwer = P(V k. Benjaini and Hochberg [1] (hereafter referred to as BH pioneered an alternative. Define the false discovery proportion (FDP to be the nuber of false rejections divided by the nuber of rejections (FDP = V/(R 1. The only effect of the R 1 in the denoinator is that the ISI/BS

2 348 H. Cao and M.R. Kosorok Table 1. Outcoes when testing hypotheses Hypothesis Accept Reject Total Null true U V 0 Alternative true F S 1 Total W R ratio V/R is set to zero when R = 0. Without loss of generality, we treat FDP = V/R and define the false discovery tail probability FDTP = P(V αr, where α is pre-specified, based on the application. Several papers have developed procedures for FDTP control. We shall not attept a coplete review here, but ention the following: van der Laan, Dudoit and Pollard [26] proposed an augentation-based procedure, Lehann and Roano [18] derived a step-down procedure and Genoves and Wasseran [13] suggested an inversion-based procedure, which is equivalent to the procedure of [26] under ild conditions [13]. The false discovery rate (FDR is the expected FDP. BH provided a distribution-free, finitesaple ethod for choosing a p-value threshold that guarantees that the FDR is less than a target level γ. Since this publication, there has been a considerable aount of research on both the theory and application of FDR control. Benjaini and Hochberg [2] and Benjaini and Yekutieli [3] extended the BH ethod to a class of dependent tests. A Bayesian ixture odel approach to obtain ultiple testing procedures controlling the FDR is considered in [11,21 24]. Wu [29] considered the conditional dependence odel under the assuption of Donsker properties of the indicator function of the true state for each hypothesis and derived asyptotic properties of false discovery proportions and nubers of rejected hypotheses. A systeatic study of ultiple testing procedures is given in the book [9]. Other related work can be found in [6,7]. One challenge in ultiple hypothesis testing is that any procedures depend on the proportion of null hypotheses, which is not known in reality. Estiating this proportion has long been known as a difficult proble. There have been soe interesting developents recently, for exaple, the approach of [20] (seealso[11,13,17,19]. Roughly speaking, these approaches are only successful under a condition which [13] calls the purity condition. Unfortunately, the purity condition depends on p-values and is hard to check in practice. The general fraework for k-fwer, FDTP, FDR control and the estiation of the proportion of alternative hypotheses is based on p-values which are assued to be known in advance or can be accurately approxiated. However, the assuption that p-values are always available is not realistic. In soe special settings, approxiate p-values have been shown to be asyptotically equivalent to exact p-values for controlling FDR [12,16]. However, these approxiations are only helpful in certain siultaneous error control settings and are not universally applicable. Moreover, if the p-values are not reliable, any procedures derived later are probleatic. This otivates us to propose a ethod to find critical values directly for rejection regions to control k-fwer, FDTP and FDR by using one-saple and two-saple t-statistics. The advantage of using t-tests is that they require iniu conditions on the population, only existence of the fourth oent, which is relatively easily satisfied by ost statistical distributions, rather than other stringent conditions such as the existence of the oent generating function. In addition, we approxiate tail probabilities of both null and alternative hypotheses accurately, rather than

3 t-tests in very high diensions 349 p-value approaches that only consider the case under null hypotheses. Thus, a better ranking of hypotheses is obtained. Furtherore, we propose a consistent estiate of the proportion of alternative hypotheses which only depends on test statistics. As long as the asyptotic distribution of the test statistic is known under the null hypothesis, we can apply our ethod to estiate this proportion, resulting in ore precise cut-offs. The BH procedure controls the FDR conservatively at π 0 γ, where π 0 is the proportion of null hypotheses and γ is the targeted significance level. If π 0 is uch saller than 1, then the statistical power is greatly coproised. The power we use in this paper is NDR = E[S]/ 1,as defined in [8]. In the situation that t-statistics can be used, our procedure gives a better approxiation and ore accurate critical values can be obtained by plugging in the estiate of π 0.The validity of our approach is guaranteed by epirical process ethods and recent theoretical advances on self-noralized oderate deviations, in cobination with Berry Esseen-type bounds for central and non-central t-statistics. To illustrate, we siulate a Markov chain, as in [25], of Bernoulli variables (H i, i = 1,...,5000, to indicate the true state of each hypothesis test (H i = 1 if the alternative is true; H i = 0 if the null is true. Conditional on the indicator, observations x ij,i = 1,...,5000,j = 1,...,80, are generated according to the odel x ij = μ i + ɛ ij. The one-saple t-statistic is used to perfor siultaneous hypothesis testing. Figure 1 shows the plot of MCMC results of the realized and noinal FDR control based on the BH ethod for different control levels. Fro this plot, we can see that as the control level increases, the BH procedure becoes ore and ore conservative. For instance, the FDR actually obtained is when the noinal level is set at 0.2, reflecting a significant loss in power. The three ethods of ultiple testing control we utilize are k-fwer, FDTP and FDR. The criterion for using k-fwer is, asyptotically, P(V k γ. (1.1 Since we only apply our ethod when there are discoveries (R >0, we need the FDTP, with a given proportion 0 <α<1 and significance level 0 <γ <1, to satisfy, asyptotically, Siilarly, the criterion for using FDR is, asyptotically, P(V αr γ. (1.2 FDR γ or 1 0 P(V αrdα γ. (1.3 The ain contributions of this paper are as follows: (1 Moderate deviation results which only require the finiteness of fourth oent, fro which the statistic is coputed in probability theory, are applied in ultiple testing. Thus, the applicability of this procedure is draatically expanded: it can deal with non-noral populations and even highly skewed populations. (2 The critical values for rejection regions are coputed directly, which circuvents the interediate p-value step. (3 An asyptotically consistent estiation of the proportion of alternative hypotheses is developed for ultiple testing procedures under very general conditions. The reainder of the paper is organized as follows. In Section 2, we present the basic data structure, our goals, the procedures and theoretical results for the one-saple t-test. Two-saple

4 350 H. Cao and M.R. Kosorok Figure 1. Claied and obtained FDR control using the BH procedure. t-test results are discussed in Section 3. Section 4 is devoted to nuerical investigations using siulation and Section 5 applies our procedure to detect significantly expressed genes in a icroarray study of leukeia cancer. Soe concluding rearks and a discussion are given in Section 6. Proofs of results fro Sections 2 and 3 are given in the Appendix. 2. One-saple t-test In this section, we first introduce the basic fraework for siultaneous hypothesis testing, followed by our ain results. Estiation of the unknown proportion of alternative hypotheses π 1 is presented next. We conclude the section by presenting theoretical results for the special case of copletely independent observations. This special setting is the basis for the ore general ain results and is also of independent interest since fairly precise rates of convergence can be obtained Basic fraework As a specific application of ultiple hypothesis testing in very high diensions, we use gene expression icroarray data. At the level of single genes, researchers seek to establish whether each

5 t-tests in very high diensions 351 gene in isolation behaves differently in a control versus a treatent situation. If the transcripts are pairwise under two conditions, then we can use a one-saple t-statistic to test for differential expression. The atheatical odel is X ij = μ i + ɛ ij, 1 j n, 1 i. (2.1 It should be noted that the following discussion is under this odel and does not hold in general. Here, X ij represents the expression level in the ith gene and jth array. Since the subjects are independent, for each i, ɛ i1,ɛ i2,...,ɛ in are independent rando variables with ean zero and variance σ 2 i. The null hypothesis is μ i = 0 and the alternative hypothesis is μ i 0. For the relationship between different genes, we propose the conditional independence odel, as follows. Let (H i be a {0, 1}-valued stationary process and, given (H i,x ij,i = 1,...,,are independently generated. The dependence is iposed on the hypothesis (H i, where H i = 0if the null hypothesis is true and H i = 1 if the alternative is true. Fro Table 1, we can see that H i = 1 and (1 H i = 0. It is assued that (H i satisfy a strong law of large nubers: 1 H i π 1 (0, 1 a.s. (2.2 This condition is satisfied in a variety of scenarios, for exaple, the independent case, Markov odels and stationary odels. Consider the one-saple t-statistic where X i = 1 n n j=1 T i = n X i /S i, X ij, S 2 i = 1 n 1 n (X ij X i 2. If we use t as a cut-off, then the nuber of rejected hypotheses and the nuber of false discoveries are, respectively, j=1 R = 1 { Ti t}, V = (1 H i 1 { Ti t}. (2.3 Under the null hypothesis, it is well known that T i follows a Student t-distribution with n 1 degrees of freedo if the saple is fro a noral distribution. Asyptotic convergence to a standard noral distribution holds when the population is copletely unknown, provided that it has a finite fourth oent under the null hypothesis. Moreover, under the alternative hypothesis, T i can also be approxiated by a noral distribution, but with a shift in location. We will show that F 0 (t := P( T i t H i = 0 = P( Z t ( 1 + o(1 = 2 (t ( 1 + o(1, (2.4 F 1 (t := P( T i t H i = 1 = E [ P ( Z + nμ i /σ i t μ i,σ i ]( 1 + o(1, (2.5

6 352 H. Cao and M.R. Kosorok uniforly for t = o(n 1/6 under soe regularity conditions, where Z denotes the standard noral rando variable, is the tail probability of the standard noral distribution and the critical values t n, that control the FDTP and FDR asyptotically at prescribed level γ are bounded. These assuptions are fairly realistic in practice. We do not require the critical value for k- FWER to be bounded. Although we do not typically know 1, F 0 (t or F 1 (t in practice, we need the following theore the proof of which is given in the Appendix as the first step. We will shortly extend this result, in Theore 2.2 below, to perit estiation of the unknown quantities. Theore 2.1. Assue that E(ɛ ij μ i,σi 2 = 0, Var(ɛ ij μ i,σi 2 = σ i 2, li sup Eɛ4 ij <, 0< π 1 < 1 α and (2.2 is satisfied. Also, assue that there exist ɛ 0 > 0 and c 0 > 0 such that Let and P ( nμ i /σ i ɛ 0 H i = 1 c 0 n 1. (2.6 μ (t = α 1 F 1 (t (1 α 0 F 0 (t (2.7 σ 2 (t = α2 1 F 1 (t ( 1 F 1 (t + (1 α 2 0 F 0 (t ( 1 F 0 (t. (2.8 (i If t fdtp n, is chosen such that t fdtp n, = inf{t : μ (t/σ (t z γ }, (2.9 where z γ is the γ th quintile of the standard noral distribution, then holds. (ii If tn, fdr is chosen such that { tn, fdr = inf then holds. (iii If tn, k-fwer is chosen such that where η(t Poisson(θ(t and li P(FDP α = li P(V αr γ (2.10 t : } 0 F 0 (t 0 F 0 (t + 1 F 1 (t γ, (2.11 li FDR = li E(V/R γ (2.12 t k-fwer n, = inf { t : P ( η(t k γ }, (2.13 θ(t= o F 0 (t,

7 t-tests in very high diensions 353 then holds. li k-fwer = li P(V k γ (2.14 Reark 2.1. In the next section, we use a Gaussian approxiation for F 0 (t and F 1 (t for both FDTP and FDR, for which the critical values are shown to be bounded. In this case, can be arbitrarily large, while the critical value reains bounded. Due to sparsity, we use a Poisson approxiation for k-fwer, for which the critical value is no longer bounded as, and we require log = o(n 1/ Main results Note that in Theore 2.1, there are an unknown paraeter 1 and unknown functions F 0 (t and F 1 (t involved in μ (t and σ (t. For practical settings, we need to estiate these quantities. We will begin by assuing that we have a strongly consistent estiate of π 1 and will then provide one such estiate in the next section. Given H, note that p(t = P( T i t= (1 H i P ( T i t H i = 0 + H i P( T i t H i = 1 can be estiated fro the epirical distribution ˆp (t of { T i }, where ˆp (t = 1 I { Ti t}, (2.15 and that P( T i t H i = 0 is close to P( Z t when n is large, by (2.4. The next theore, provedintheappendix, provides a consistent estiate of the critical value t n,. Theore 2.2. Let and ν (t = α ˆp (t 2(1 ˆπ 1 (t (2.16 τ 2 (t = α2( ˆp (t 2(1 ˆπ 1 (t ( 1 1ˆπ ( ˆp (t 2(1 ˆπ 1 (t 1 ( (1 α 2 (1 ˆπ 1 (t ( 1 2 (t, where ˆπ 1 is a strongly consistent estiate of π 1. Assue that the conditions of Theore 2.1 are satisfied. (i If ˆt fdtp n, is chosen such that then { ˆt n, fdtp ν (t = inf t : z γ }, (2.18 τ (t ˆt fdtp n, t fdtp n, =o(1 a.s. (2.19

8 354 H. Cao and M.R. Kosorok (ii If ˆt fdr n, is chosen such that then (iii If ˆt k-fwer n, is chosen such that where ζ(t Poisson( θ(t and then, as long as log = o(n 1/3, we have { ˆt n, fdr = inf t : 2(1 ˆπ } 1 (t γ, (2.20 ˆp (t ˆt fdr n, t fdr n, =o(1 a.s. (2.21 ˆt k-fwer n, = inf { t : P ( ζ(t k } γ, (2.22 θ(t= 2(1 ˆπ 1 (t, ˆt n, k-fwer tn, k-fwer =o(1 a.s. (2.23 Reark 2.2. This theore deals with the general dependence case, where (H i 1 is assued to follow a two-state hidden odel and the data are generated independently conditional on (H i 1. The proof is ainly based on the independence case, which we present in Section 2.4 below, plus a conditioning arguent Estiating π 1 In the previous section, we assued that ˆπ 1 was a consistent estiator of π 1.Wenowdevelop one such estiator. By the two-group nature of ultiple testing, the test statistic is essentially a ixture of null and alternative hypotheses with proportion as a paraeter. By virtue of oderate deviations, the distribution of t-statistics can be accurately approxiated under both null and alternative hypotheses. However, for the alternative approxiation, an unknown ean and variance are involved. So, we think of a functional transforation of the t-statistics which has a ceiling at 1 to first get a conservative estiate of π which is consistent under certain conditions. Let c>0 and define g c (x = in( x,c/c. It is easy to see that g c is a decreasing function of c, bounded by 1, and that the derivative dg c dc is bounded by 1/c. Hence, the function class {g c} indexed by c is a Donsker class and thus also Glivenko Cantelli. Let ĝ c = 1 g c (T i. (2.24 Theore 2.3. We have π 1 li sup,n c>0 ĝ c E(g c (Z 1 E(g c (Z a.s.

9 t-tests in very high diensions 355 If, in addition, we assue that nμi /σ i for all i with H i = 1,i = 1,...,, a.s. as n, (2.25 then where π 1 = li sup,n c>0 ĝ c E(g c (Z 1 E(g c (Z a.s., E(g c (Z = 2 c 2π (1 e c2 /2 + 2 (c. Proof. We can write 1 {Hi =0} ĝ c = := 0 I + 1 II. g c (T i 1 {Hi =0} 1 {Hi =0} 1 {Hi =1} + g c (T i 1 {Hi =1} 1 {Hi =1} Let H ={H i, 1 i }. Conditional on H, T i, 1 i, are independent rando variables. We consider I first. Let g c (T i H1 {Hi =0} E(g c (T i H1 {Hi =0} A (c = 1, {Hi =0} 1 {Hi =0} let E be the infinite sequence 1 {H1 =0}, 1 {H2 =0},...and let F be the event that 1 {Hi =0} as. By the assuption (2.2, we know that P(F= 1. Thus, ( [ ( P li A (c =0 = E P li A (c =0 ] E = 1, sup c>0 sup c>0 where the second equality follows fro the fact that, conditional on E, the ters in the su are i.i.d. and thus the standard Glivenko Cantelli theore applies. Arguing siilarly, based on conditioning on the sequence 1 {H1 =1}, 1 {H2 =1},...,we can also establish that sup g c (T i H1 {Hi =1} 1 {Hi =1} c>0 E(g c (T i H1 {Hi =1} 1 {Hi =1} 0 Now, note that II 1. Thus, since 0 / (1 π 1 a.s. and 1 / π 1 a.s., we have that when,n, ĝ c (1 π 1 E(g c (Z + π 1 We now have the following lower bound for π 1 : a.s. = E(g c (Z + ( 1 E(g c (Z π 1. a.s. π 1 li sup,n c>0 ĝ c E(g c (Z 1 E(g c (Z a.s. (2.26

10 356 H. Cao and M.R. Kosorok Define 1 := (1 π 1 E(g c (Z + π E(g c (T i H1 {Hi =1}, E(g c (Z + nμ i /σ i 1 {Hi =1} 2 := (1 π 1 E(g c (Z + π 1. 1 {Hi =1} Letting n,wehavesup c> a.s.Also, 2 = (1 π 1 E(g c (Z + π {Hi =1} ( E (g c Z + nμi (1 π 1 E(g c (Z + π 1 P( Z + nμ i /σ i ch i 1 {Hi =1} σ i (I{ Z+ nμ i /σ i c} + I { Z+ nμ i /σ i <c} H i (1 π 1 E(g c (Z + π 1 = E(g c (Z + π 1 ( 1 E(gc (Z. Note that Therefore, sup ĝ c 1 0 a.s. as,n. c ĝ c E(g c (Z + π 1 ( 1 E(gc (Z a.s. as,n. Thus, we obtain π 1 li sup,n c>0 ĝ c E(g c (Z 1 E(g c (Z a.s. (2.27 As a consequence of this theore, we propose the following estiate of π 1 : ĝ c E(g c (Z ˆπ 1 := sup c>0 1 E(g c (Z, (2.28 where E(g c (Z = 2 c 2π (1 e c2 /2 + 2 (c. Reark 2.3. If we use ˆπ 1,asgivenin(2.28, then Theore 2.2 yields a fully autoated procedure to carry out ultiple hypothesis testing in very high diensions in practical data settings.

11 t-tests in very high diensions Consistency and rate of convergence under independence In order to prove the ain results in the general, possibly dependent, t-test setting, we need results under the assuption of independence between t-tests. Specifically, we assue in this section that (T i,h i, i = 1,...,are independent, identically distributed rando variables with π 1 = P(T i = 1. This independence assuption can also yield stronger results than the ore general setting and is of independent interest. The next theore, proved in the Appendix, provides a strong consistent estiate of the critical value t n,, as well as its rate of convergence. Theore 2.4. Let and ν (t = α ˆp (t 2(1 π 1 (t (2.29 τ 2 (t = α2 ˆp (t ( 1 ˆp (t + 4α(1 π 1 ˆp (t (t + 2(1 π 1 (t ( 1 2α 2(1 π 1 (t. Assue the conditions of Theore 2.1 with (2.2 replaced by the assuption that (T i,h i, i = 1,...,, are i.i.d. and π 1 = P(T i = 1. Let J ={i : H i = 1} be the set that contains the indices of alternative hypotheses. Also, assue that μ i,σ i are i.i.d. for i J. (i If ˆt fdtp n, is chosen such that { ˆt n, fdtp ν (t = inf t : z γ }, (2.30 τ (t then ˆt fdtp n, t fdtp n, =O ( n 1/2 + 1/2 (log log 1/2 a.s. (2.31 and ˆt fdtp n, t fdtp n, =O(n 1/2 + 1/2 in probability. (2.32 Here, t fdtp n, is the critical value defined in (A.26. (ii If ˆt fdr n, is chosen such that then and { ˆt n, fdr = inf t : 2(1 π } 1 (t γ, (2.33 ˆp (t ˆt fdr n, t fdr n, =O ( n 1/2 + 1/2 (log log 1/2 a.s. (2.34 ˆt fdr n, t fdr n, =O(n 1/2 + 1/2 in probability. (2.35

12 358 H. Cao and M.R. Kosorok Here, tn, fdr is the critical value defined in (A.28. (iii If ˆt n, k-fwer is chosen such that where ζ(t Poisson( θ(t and then Here t k-fwer n, ˆt k-fwer n, = inf { t : P ( ζ(t k } γ, (2.36 θ(t= 2(1 ˆπ 1 (t, ˆt n, k-fwer tn, k-fwer =O((log 1/2 a.s. (2.37 is the critical value defined in (A.30. Reark 2.4. If α = γ in Theore 2.4, then it is not difficult to see that ˆt n, fdtp ˆt n, fdr = O( 1/2 a.s.therefore, (2.31 and (2.32 reain valid with ˆt n, fdtp replaced by ˆt n,. fdr This shows that controlling FDTP is asyptotically equivalent to controlling FDR. This is also true in the ore general dependence case. Thus, we will focus priarily on FDR in our nuerical studies. Reark 2.5. Note that π 1 is assued to be known in order to get a precise rate of convergence for FDTP and FDR. If ˆπ 1 is estiated with rate of convergence r n, then the correct convergence rate for the in probability result for FDR and FDTP would involve an additional ter O(r n added in (2.32 and (2.35. It is unclear what the correction would be for the alost sure rate in (2.31 and (2.34. These corrections are beyond the scope of this paper and will not be pursued further here. Note that the rate of ˆπ 1 is not needed in the ain results presented in Sections Two-saple t-test In this section, the results of the previous section are extended to the two-saple t-test setting. The estiator of the unknown paraeter π 1 reains the sae as in the one-saple case, but with T i in (2.24 being the two-saple, rather than one-saple, t-statistic. Theoretical results for the rates of convergence under independence are also presented, as in the previous section Basic set-up and results When two groups, such as a control and an experiental group, are independent, which we assue here, a natural statistic to use is the two-saple t-statistic. As far as possible, we adopt the sae notation as used in the one-saple case, and we assue that (2.2 holds. We observe the rando variables X ij = μ i + ɛ ij, 1 j n 1, 1 i, Y ij = ν i + ω ij, 1 j n 2, 1 i,

13 t-tests in very high diensions 359 with the index i denoting the ith gene, j indicating the jth array, μ i representing the ean effect for the ith gene fro the first group and ν i representing the ean effect for the ith gene fro the second group. The sapling processes for the two groups are assued to be independent of each other. The saple sizes n 1 and n 2 are assued to be of the sae order, that is, 0 <b 1 n 1 /n 2 b 2 <. We will also assue that for each i, ɛ i1,ɛ i2,...,ɛ in1 are independent rando variables with ean zero and variance σi 2; ω i1,ω i2,...,ω in2 are independent rando variables with ean zero and variance τi 2. The null hypothesis is μ i = ν i, the alternative hypothesis is μ i ν i and the dependence is assued to be generated in the sae anner as the dependence in the one-saple setting. Consider the two-saple t-statistic where Then Ti X i Ȳ i =, S1i 2 /n 1 + S2i 2 /n 2 X i = 1 n 1 X ij, Ȳ i = 1 n 2 Y ij, n 1 n 2 j=1 S 2 1i = 1 n 1 1 R = n 1 j=1 (X ij X i 2, S2i 2 = 1 (Y ij Ȳ i 2. n 2 1 j=1 1 { T i t}, V = n 2 j=1 (1 H i 1 { T i t}. (3.1 The two-saple t-statistic is one of the ost coonly used statistics to construct confidence intervals and carry out hypothesis testing for the difference between two eans. There are several preises underlying the use of two-saple t-tests. It is assued that the data have been derived fro populations with noral distributions. Based on the fact that S 1i σ i,s 2i τ i a.s., with oderate violation of the assuption, statisticians quite often recoend using the two-saple t-test, provided the saples are not too sall and the saples are of equal or nearly equal size. When the populations are not norally distributed, it is a consequence of the central liit theore that two-saple t-tests reain valid. A ore refined confiration of this validity under non-norality based on oderate deviations is shown in [4]. Furtherore, under the alternative hypothesis, the asyptotic results still hold, but with a shift in location siilar to the one-saple case under certain conditions, that is, P( Ti t H i = 0 = P( Z t ( 1 + o(1, ( P( Ti t H i = 1 = P Z + μ i ν i (1 t + o(1, B n1,n 2 uniforly in t = o(n 1/6, where Bn 2 1,n 2 = σi 2/n 1 +τi 2/n 2. Under the assuption of (2.2, asyptotic critical values to control FDTP, FDR and k-fwer are very siilar to the one-saple t-test

14 360 H. Cao and M.R. Kosorok case with the one-saple t-statistic T i replaced by the two-saple t-statistic Ti. The following theore, proved in the Appendix, is analogous to Theore 2.1 and is a necessary first step. Theore 3.1. Assue that E(ɛ ij μ i, σi 2 = 0, E(ω ij ν i, τi 2 = 0, Var(ɛ ij μ i,σi 2 = σ i 2, Var(ω ij ν i,τi 2 = τ i 2, li sup Eɛ4 ij <, li sup Eτ4 i,j <, 0<π 1 < 1 α and that (2.2 is satisfied. Assue that there exist ɛ 0 and c 0 such that ( μ i ν i P ɛ 0 Hi = 1 c 0 for all n 1,n 2. (3.2 B n1,n 2 The conclusions of Theore 2.1 then hold with the one-saple t-statistic T i replaced by the two-saple t-statistic T i Main results The unknown paraeter 1 and functions F 0 (t and F 1 (t in Theore 3.1 are estiated siilarly as in the one-saple case with the one-saple t-statistic replaced by its two-saple counterpart. The following theore, the proof of which is given in the Appendix, gives our ain results for two-saple t-tests. Theore 3.2. Assue that the conditions in Theore 3.1 are satisfied. Replace the one-saple t-statistic T i by the two-saple t-statistic Ti in Theore 2.2. Let ˆπ 1 be a strong consistent estiate of π 1, as in (2.28, using the two-saple t-statistic T (i If ˆt fdtp n, is chosen such that then (ii If ˆt fdr n, is chosen such that then (iii If ˆt k-fwer n, is chosen such that i. { ˆt n, fdtp ν (t = inf t : z γ }, (3.3 τ (t ˆt fdtp n, t fdtp n, =o(1 a.s. (3.4 { ˆt n, fdr = inf t : 2(1 ˆπ } 1 (t γ ˆp (t (3.5 ˆt fdr n, t fdr n, =o(1 a.s. (3.6 ˆt k-fwer n, = inf { t : P ( ζ(t k } γ, (3.7

15 t-tests in very high diensions 361 where ζ(t Poisson( θ(t and then, provided log = o(n 1/3, we have θ(t= 2(1 ˆπ 1 (t, ˆt n, k-fwer tn, k-fwer =o(1 a.s. (3.8 Reark 3.1. ˆπ 1 can be estiated via (2.28 by using two-saple t-statistics. Theore 2.3 is applicable in the two-saple setting, as well as in the one-saple case, and consistency follows. Thus, Theore 3.2 gives a fully autoated procedure to conduct ultiple hypothesis testing using two-saple t-statistics after we plug in the ˆπ 1 given in ( Consistency and rate of convergence under independence Results for the independence setting are needed for the proofs of the ain results, as was the case for one-saple t-tests. We can, once again, obtain ore precise estiation copared with the general dependence case. The following theore, proved in the Appendix, gives us conditions and conclusions using two-saple t-statistics for controlling FDTP and FDR asyptotically, as well as rates of convergence under the assuption that (T i,h i are independent of each other for 1 i. Assue that π 1 is the proportion of the alternative hypotheses aong hypothesis tests, that is, π 1 = P(H i = 1.LetJ ={i : H i = 1}. Theore 3.3. Assue the conditions of Theore 3.1 are satisfied. Rather than (2.2, we assue that (T i,h i are independent and identically distributed. In addition, π 1 = P(T 1 = 1 and μ i,σ i are i.i.d. for i J. Let and p(t = P( T1 (3.9 a 1 (t = αp(t (1 π 1 P ( T1 1 = 0, (3.10 b1 2 (t = α2 p(t ( 1 p(t + 2α(1 π 1 p(tp ( T1 1 = 0 + (1 π 1 P ( T1 1 = 0 ( 1 2α (1 π 1 P ( T1 1 = 0, ˆp (t = 1 I { T i t}, (3.11 ν (t = α ˆp (t 2(1 π 1 (t, (3.12 τ 2 (t = α2 ˆp (t ( 1 ˆp (t + 4α(1 π 1 ˆp (t (t + 2(1 π 1 (t ( 1 2α 2(1 π 1 (t. The conclusions of Theore 2.4 then hold with the one-saple t-statistics T i replaced by the two-saple t-statistics T i.

16 362 H. Cao and M.R. Kosorok Reark 3.2. In the above sections, we developed our theores based on two-sided tests. The results for the case of one-sided tests are very siilar, but with the rejection region {T i t} for each test. We oit the details. 4. Nuerical studies In this section, we present nuerical studies based on siulated data and copare the power of our approach with [1] (BH and [23] (ST approaches using one-saple t-statistics. The results for using two-saple t-statistics are very siilar and so we oit the details here Siulation study 1 We investigate the results for the i.i.d. case first. Recall the odel X ij = μ i + ɛ ij, 1 i, 1 j n. We set the signal using μ i Unif (0.5, 1 or μ i Unif ( 1, 0.5, which is of the correct order for the standardized error ter. Here, the nuber of hypothesis tests is = , which is the sae for all following siulation studies, unless otherwise noted. The proportion of alternatives π 1 = 0.2 and the error ter t(4 are used just to illustrate the asyptotic results. We vary the nuber of arrays n fro 20 to 50 to 300 to evaluate our asyptotic approxiation. Epirical distributions of FDTP, FDR and k-fwer based on repetitions are treated as the gold standard since they have alost negligible Monte Carlo error. The saples are generated to evaluate our proposed ethod based on asyptotic theory. Specifically, for each saple, we calculate the saple paths of the following quantities indexed by t: ν (t/τ (t for studying FDTP, 2(1 ˆπ 1 (t/ ˆp (t for studying FDR and P(Poisson(2(1 ˆπ 1 (t 10 for studying 10-FWER (here, we choose k = 10 just for the purposes of illustration. ˆπ 1 is defined as in (2.28. Figure 2 shows the overlay of the true path and 100 rando estiated paths for FDTP, FDR and k-fwer, respectively. As n increases, we see that the true path and estiated paths are fairly close to each other, which, in turn, validates our asyptotic theory. We can see that the slopes of FDTP and 10-FWER are very steep, which eans a sall change in the critical value results in a large change in the level of control, while the FDR has a flatter trend Siulation study 2 Under the sae set-up as in the previous section, we siulate data with different error ters: standard noral (N(0, 1, Student t with one degree of freedo (Cauchy, Student t with four degrees of freedo (t(4, Student t with ten degrees of freedo (t(10, Laplace and exponential. Note that, except for the Cauchy error ter, all of the error ters satisfy the condition

17 t-tests in very high diensions 363 Figure 2. Overlay of true and 100 rando estiated saple paths with respect to cut-off t for the three procedures under differing saple sizes. of finite fourth oent. Epirical distributions of FDTP, FDR and k-fwer based on repetitions are treated as the gold standard for obtaining true critical values. Each scenario is repeated 1000 ties to evaluate our proposed ethod for estiating the critical value based on asyptotic theory. We control FDR at different levels (fro 0.01 to 0.2 to get true and estiated critical values. Asyptotically, the estiated critical value ˆt based on our theory should be very close to the true critical value t and lie on a diagonal line of the square. Fro Figure 3, the estiated critical values ˆt do not atch the true critical value t under the Cauchy error since the Cauchy distribution does not have finite fourth oent. For the Cauchy distribution, even the central liit theore does not hold since it does not have finite ean. As the nuber of arrays n increases, the estiated critical values ˆt atch the true critical values t better under syetric error ters (N(0, 1, t(4, t(10 and Laplace, but not quite so well under asyetric errors (e.g., exponential errors. The difficulty with the exponential error ters suggests the value of conducting research to derive higher order approxiations. We plan to undertake this in the near future.

18 364 H. Cao and M.R. Kosorok Figure 3. Coparison of true and estiated critical values using FDR for different error ters and nubers of arrays n Siulation study 3 The above results are fro the independent test setting. We carried out siilar siulation studies for the dependent setting and found that the corresponding plots are quite siilar to the above results and the sae conclusions can be drawn. To see whether our proposed ethod obtains the claied level of control, we use a hidden Markov chain to generate dependent indicators H i,i = 1,...,. Conditional on H i,i = 1,...,, the data is generated independently. The transition probability of the hidden Markov chain is set to ( 1 p1 p 1, p 0 1 p 0 where p 1 is the transition probability fro 0 to 1 and p 0 is the transition probability fro 1 to 0. In the siulation, p 0 = 0.8 and p 1 = 0.2. Based on the liiting stationary distribution, the alternative proportion should be π 1 = p 1 /(p 0 +p 1. Under the null hypothesis, we siulate data fro four error ters (N(0, 1, t(4, Laplace and exponential and, under the alternative hypothesis, we siulate data with ean effects half fro Unif (0.1, 0, 5 and half fro Unif ( 0.5, 0.1, plus the sae four error ters. Figure 4 uses FDR as the control criterion. For different control levels γ, we copare the claied level of control and the actually obtained level of control

19 t-tests in very high diensions 365 Figure 4. Coparison of noinal and obtained control level for different error ters and nubers of arrays n. based on our ethod for different nubers of arrays: sall (n = 20, ediu (n = 50 and large (n = 300. Fro Figure 4, we can see that when the nuber of arrays n is sall (n = 20, we do not, in general, achieve the claied level of control. If we have a ediu saple size (n = 50, the obtained level of control is very close to the noinal level of control and the results are alost perfect if we have a large nuber of arrays (n = 300, even for the asyetric exponential error ter. This strongly supports our theoretical predictions but suggests that higher order approxiations would be useful in soe settings. To see the perforance of our ethod using 10-FWER, Table 2 suarizes the control level actually obtained for different error ters and nubers of arrays n when the noinal control Table 2. Obtained control level using 10-FWER with noinal control level 0.05 n N(0, 1 t(4 Laplace Exponential (9.0e (7.0e (1.1e 02 1 ( (1.2e (9.1e (1.2e 02 1 ( (3.8e (2.8e (2.7e (4.6e 03

20 366 H. Cao and M.R. Kosorok level is The obtained control level is incorrect when the nuber of arrays n is sall, which can be deduced fro the saples paths of 10-FWER given in Figure 1. It has a very steep slope, so when n is sall, the approxiation is crude and there is a noticeable difference between the estiated critical value and the true critical value, yielding a big difference in the control level. For large saple sizes, the obtained control level is reasonably good because our asyptotic theory begins to take effect. The exponential error setting appears not to perfor as well as the other error settings Siulation study 4 All previous nuerical studies involve the alternative proportion estiate ˆπ 1 defined in (2.28. In this section, we investigate nuerically how this estiate is affected by nuber of arrays n and copare with the alternative estiate proposed by [23]. The first siulation set-up is siilar to the one in the previous section. We drew N = 1000 sets of data as follows. Dependent indicators H i,i = 1,...,, are generated fro a hidden Markov chain with the liiting alternative proportion π 1 = 0.2. Conditional on these, a vector of expected values, μ = (μ 1,...,μ,was constructed. The expected values for the true null hypotheses were set to 0 with standard noral noise, whereas the expected values for the alternative hypotheses were drawn fro Unif (0.1, 0.5 plus standard noral noise. Correspondingly, 1000 replications of the proportion estiate ˆπ 1 were calculated using (2.28. The root eans square error (RMSE is given as RMSE = 1 N ( (n ˆπ N 1 π (n 2, 1 n=1 where ˆπ (n 1 is the estiate of π 1 for the nth siulated data set and π (n 1 is the truth. Table 3 suarizes the effect of n. As the nuber of arrays n increases, the RMSE gets saller, which validates our asyptotic prediction. In the second siulation, we copare our proportion estiate with the one using spline soothing proposed by [23]. Recall the proportion estiate π 0 (λ = #{p i >λ; i = 1,...,}/ ((1 λ. The soothing approach proceeds as follows: first, π 0 (λ are calculated over a (fine grid of λ; then, a natural cubic spline y with three degrees of freedo is fitted to (λ, ˆπ 0 (λ; finally, π 0 is estiated by ˆπ 0 = y(1. The siulation set-up is siilar to the previous one, except that we have two groups here with n 1 = 70 and n 2 = 80. We change the alternative proportion to copare the perforances of our approach (π1 ck with the spline soothing approach (π st 1 intable 4. They produce very siilar results; both are conservative, with less bias using our approach and less variance using the spline soothing approach. The advantage of our approach is that it Table 3. RMSE for N = 1000 estiated values of π 1 n RMSE

21 t-tests in very high diensions 367 Table 4. Proportion estiate coparison π ˆπ 1 ck ˆπ 1 st sd( ˆπ 1 ck sd( ˆπ 1 st is coputationallyvery fast, while thespline soothingapproachrequires that p-values are first obtained using perutation, which is coputationally uch ore intensive than our approach (which can be coputed directly fro the t-statistics Coparison with BH and ST procedures In this section, we copare our approach with the BH and ST procedures under the dependence structure described in [29]. We also use a hidden Markov odel to siulate the indicator function H i,i = 1,...,. Conditional on H i,i = 1,...,, the data is generated independently. The nuber of hypotheses tested = 5000 and the nuber of arrays n = 80. The data generating echanis is otherwise the sae as in the independence case. First, we construct a one-saple t-statistic and apply our procedure to obtain the critical value for the rejection region. We then obtain p-values and q-values, and apply the BH and ST procedures to decide which genes are significantly expressed. We now briefly describe the BH procedure. Let p i be the arginal p-value of the ith test, 1 i, and let p (1 p ( be the order statistics of p 1,...,p.Givena control level γ (0, 1, let r = ax { i {0, 1,...,+ 1} : p (i γi/ }, where p 0 = 0 and p (+1 = 1. The BH procedure rejects all hypotheses for which p (i p (r. If r = 0, then all hypotheses are accepted. The q-value in [23] is siilar to the well-known p-value, except that it is a easure of significance in ters of FDR, rather than type I error, and an estiate of alternative proportion is plugged in, based on available p-values, as described in the previous section. We revisit the otivating exaple and give a plot of the claied FDR and actually obtained FDR by using the proposed critical value ethod. Fro Figure 5, we can see that our procedure controls the FDR at the claied level asyptotically, although soewhat liberally for finite saples, and has better power at the sae target FDR level copared with the BH and ST procedures. 5. Applications to icroarray analysis We now apply the proposed procedure to the analysis of a leukeia cancer data set [14] inorder to identify differentially expressed genes between AML and ALL. For the original data, see

22 368 H. Cao and M.R. Kosorok Figure 5. FDR control and power coparison.

23 t-tests in very high diensions In this analysis, we use the ethodology developed for the dependence case. The raw data consist of = 7129 genes and 72 saples coing fro two classes: 47 in class ALL (acute lyphoblastic leukeia and 25 in class AML (acute yeloid leukeia. Our siulation results showed reasonable perforance of the procedure for a oderate saple size in this range. For each gene location, the two-saple t-statistic coparing the 47 ALL responses with the 25 AML responses was coputed. Using our proposed approach for the dependent case, we find the critical value for controlling FDR at level γ, { ˆt n, fdr = inf t : 2(1 ˆπ } 1 (t γ, ˆp (t where ˆp = 1 { Ti t}/ and ˆπ 1 is estiated by (2.28. In Figure 6, we plot the FDR level and the nuber of significantly expressed genes by our (CK procedure, BH procedure and the q-value based Storey Tibshirani (ST procedure. Fro the plot, we can see that our procedure detects the largest nuber of significant genes, followed by the ST procedure and then the BH procedure, which is the ost conservative one. At FDR level 0.01, we detected 870 genes, the ST procedure detected 778 genes and the BH procedure detected 614 genes. Using the two-saple t-test, siilarly to the higher power of our approach in siulation studies, we detected all of the genes that the other two approaches detected. The Figure 6. Coparison between our (CK procedure, the ST procedure and the BH procedure using real data.

24 370 H. Cao and M.R. Kosorok BH procedure is very conservative at the expense of power loss. The ST procedure requires perutation to obtain p-values, while our procedure gets the critical value directly and is thus faster in ters of coputation. The estiation of π 1 is by our procedure and by the ST procedure. These results can serve as a first exploratory step for ore refined analyses concerning these significant genes. Another issue ay be that the critical value approach based on asyptotic FDR control ay not be conservative enough in soe settings. 6. Concluding rearks and discussion We have presented a new approach for the significance analysis of thousands of features in highdiensional biological studies. The approach is based on estiating the critical values of the rejection regions for high-diensional ultiple hypothesis testing, rather than the conventional p-value approaches in the literature. We developed a detailed ethod that can be used to identify differentially expressed genes in icroarray experients. The proposed procedure perfors well for large saples, reasonably well for interediate saples and not quite as well for sall saples, and appears to perfor better than existing alternatives under realistic saple sizes. Our ethod is also coputationally faster than the copeting approaches. The potential for iproveent in sall-saple perforance otivates the need for a second-order expansion of our theoretical work. In addition, we have proposed a new consistent estiate of the proportion of alternative hypotheses under certain conditions. Nuerical studies deonstrate that our ethodology fits the truth well and iproves the statistical power in ultiple testing. Extensions of the current work can be pursued in several directions. First, as stated above, the precision of the asyptotic approxiations has roo for iproveent in sall-to-oderately-sall saple sizes, suggesting that a second-order expansion would be valuable. Second, in the dependence case, it would be of interest to see how the rate of convergence could be derived under various assuptions on the for of the dependence. Thirdly, the plug-in estiator π 1 is consistent, but soewhat ad hoc. Coplete, theoretical properties of this estiator reain to be explored. Last, but not least, we only considered a fixed proportion π 1 of alternative hypotheses. It is of great interest also to consider the sparsity setting, in which π 1 0as, and to see what patterns eerge. Appendix: Proofs of ain results Our ain tools are liit theores of epirical processes, Berry Esseen bounds and selfnoralized oderate deviations for one- and two-saple t-statistics. A.1. Preliinary leas We first state a non-unifor Berry Esseen inequality for nonlinear statistics. Lea A.1 ([5]. Let ξ 1,ξ 2,...,ξ n be independent rando variables with Eξ i = 0, n Eξi 2 = 1 and E ξ i 3 <. Let W n = n ξ i and = (ξ 1,...,ξ n be a easurable

25 t-tests in very high diensions 371 function of {ξ i }. Then P(W n + z (z P ( >( z +1/3 + C( z +1 3 ( 2 + n ( Eξ 2 1/2 ( i E( i 2 n 1/2 + E ξ i. 3 (A.1 This is [5], Theore 2.2, and the proof can be found there. The next lea provides a Berry Esseen bound for non-central t-statistics. Lea A.2. Let X, X 1,...,X n be i.i.d. rando variables with E(X = 0, σ 2 = EX 2 and EX 4 <. Let X = 1 n X i, sn 2 n = 1 n (X i X 2. n 1 Then ( n( P X + c s n x ( x nc/σ (1 + x K (1 + x nc/σ n for any c and x, where K is a finite constant that ay depend on σ and EX 4. Proof. Without loss of generality, assue that x 0 and σ = 1. Using the fact that (A.2 1 t (1 + t 1/2 1 + t for t 1, (A.3 we have and Therefore, xs n = x(1 + s 2 n 11/2 x(1 + s 2 n 1 xs n x(1 s 2 n 1. ( n( X + c P x = P ( n( X + c xs n s n P ( n X x nc + x s 2 n 1. (A.4 (A.5 (A.6 We now apply (A.1 with ξ i = X i / n, W n = n X and z = x nc, = x s 2 n 1, i = x s 2 n,i 1, where s 2 n,i is defined as s2 n with 0 replacing X i.

26 372 H. Cao and M.R. Kosorok Noting that ( n sn 2 1 = 1 (Xj 2 n 1 1 n X n 1, j=1 sn,i 2 1 = 1 ( (Xj 2 n 1 1 n( X X i /n, 2 j i we have E s 2 n 1 2 KEX 4 /n (A.7 and E(sn 2 1 s2 n,i 2 = (n 1 2 E( (Xi 2 1 n X 2 + n( X X i /n = (n 1 2 E( (Xi 2 1 X ( i 2( X X i /n + X i /n (n 1 2 E( 2(Xi Xi 2 ( 2( X X i /n + X i /n 2 2 ( 4EX 4 (n EXi 2 ( 8( X X i /n 2 + 2EXi 2 /n (A.8 KEX 4 /n 2. It follows fro (A.7 and (A.8 that 2 K x EX 4 n, ( P > z +1 K x EX 4, 3 n(1 + z n (Eξi 2 1/2( E( i 2 1/2 x EX 4 K n and n E ξ i 3 EX3 n. Therefore, by (A.1, ( P n X x nc + x sn 2 1 ( x nc K(1 + x (1 + x nc n. (A.9

27 t-tests in very high diensions 373 Siilarly, and ( n( X + c P x P ( n X x nc x sn 2 s 1 n P ( n X x nc x sn 2 1 ( x nc K(1 + x (1 + x nc n. (A.10 This proves (A.2. We also need a oderate deviation for the non-central t-statistics, as given in the following lea. Lea A.3. Suppose that X, X i,i = 1,...,n, are independent identically distributed rando variables. Let n X i X =, sn 2 n = 1 n 1 n (X i X 2. If X satisfies E X 4 <, E(X 2 = σ 2 > 0 and E(X = 0, then ( n( X + c P t = P ( ( Z + c n/σ t 1 + o(1 s n (A.11 uniforly in c and t = o(n 1/6. Here, and in the sequel, Z denotes a standard noral rando variable. Proof. When t is bounded, (A.11 follows fro Lea A.2. Consider large t with t = o(n 1/6. We need the following result of [27,28]: ( n( X + c P t = ( 1 ( t c n/σ ( 1 + o(1 (A.12 s n uniforly in c n/σ t/5 and t = o(n 1/6. We note that following the sae lines as their proof, we can see that (A.12 reains valid for t/5 c n/σ t. We write ( ( n( X + c P n( t X + c = P s n By (A.12, the reark above and the fact that s n 1 (t + x = o ( 1 (t x ( n( X c t + P t. s n

28 374 H. Cao and M.R. Kosorok for x 1 (recall here that we assue t is large, (A.11 holds for t c n/σ t. Now, assue c n/σ > t. Then, by (A.2, ( n( P X + c t P ( Z + c n/σ t = o(1. s n Since c n/σ > t, wehavep( Z + c n/σ t 1/2 and hence ( n( X + c P t = P ( Z + c n/σ t ( 1 + o(1. s n This copletes the proof of (A.11. The lea below shows that t n, defined in (A.26 under independence is bounded. Lea A.4. Assue that there exist ε 0 > 0 and c 0 > 0 such that Let t n, satisfy (A.37. Then where t 0 is the solution of P ( nμ 1 /σ 1 ε 0 c0. t n, t 0, (A.13 (A.14 απ 1 c 0 exp ( (t 0 ε 0 ε 0 = 12(1 + t0 ε 0. (A.15 Proof. It suffices to show that Eξ1 (t 0 (var(ξ 1 (t 0 1/2 z γ. It is easy to see that P( Z + a t 0 is a onotone increasing function of a>0. Hence, P ( Z + nμ 1 /σ 1 t 0 P ( Z + nμ 1 /σ 1 t 0, nμ 1 /σ 1 ε 0 P( Z + ε 0 t 0 P ( nμ 1 /σ 1 ε 0 c 0 P( Z + ε 0 t 0 c 0 ( 1 (t0 ε 0 c 0 3(1 + t 0 ε 0 exp( (t 0 ε 0 2 /2 c 0 3(1 + t 0 ε 0 exp( t0 2 /2 + (t 0 ε 0 ε 0. (A.16 (A.17 Here, we use the fact that 1 2 e x2 /2 1 1 (x e x2 /2 2π(1 + x for x 0.

29 t-tests in very high diensions 375 Under the null hypothesis H 1 = 0, which corresponds to μ i = 0, we apply Lea A.3 and obtain P( T 1 t H 1 = 0 = P( Z t ( 1 + o(1 uniforly in t = o(n 1/6. Under the alternative hypothesis H 1 = 1, we apply Lea A.3 to X ij μ i and obtain uniforly in t = o(n 1/6. Also, note that P( T 1 t H 1 = 1 = P ( n( X 1 μ 1 + μ 1 /s 1 t H1 = 1 (A.18 = E[P( Z + nμ 1 /σ 1 t μ 1,σ 1 ] ( 1 + o(1 (A.19 = P ( Z + nμ 1 /σ 1 t ( 1 + o(1 P( T 1 t = P( T 1 t,h 1 = 0 + P( T 1 t,h 1 = 1 = (1 π 1 P ( T 1 t H 1 = 0 + π 1 P( T 1 t H 1 = 1 = (1 π 1 P ( Z t ( 1 + o(1 + π 1 P ( Z + nμ 1 /σ 1 t ( 1 + o(1. (A.20 By (A.34, (A.18, (A.20 and (A.17, Eξ 1 (t 0 = α(1 π 1 P ( Z t 0 ( 1 + o(1 + απ 1 P ( Z + nμ 1 /σ 1 t 0 ( 1 + o(1 (1 π 1 P ( Z t 0 ( 1 + o(1 c 0 απ 1 6(1 + t 0 ε 0 exp( t0 2 /2 + (t 0 ε 0 ε 0 2P(Z t0 απ 1 c 0 6(1 + t 0 ε 0 exp( t0 2 /2 + (t 0 ε 0 ε 0 e t0 2/2 ( = e t2 0 /2 απ 1 c 0 6(1 + t 0 ε 0 exp( (t 0 ε 0 ε 0 1 (A.21 = e t2 0 /2, by (A.15 and the definition of t 0. It is easy to see that Eξ1 2 1 and var(ξ 1(t 0 1 in particular. Thus, by (A.21, Eξ1 (t 0 (var(ξ 1 (t 1/2 e t2 0 /2 z γ, (A.22 provided that is large enough. This proves (A.16. The following i.i.d. results are essential for the general results.

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Generalized Augmentation for Control of the k-familywise Error Rate

Generalized Augmentation for Control of the k-familywise Error Rate International Journal of Statistics in Medical Research, 2012, 1, 113-119 113 Generalized Augentation for Control of the k-failywise Error Rate Alessio Farcoeni* Departent of Public Health and Infectious

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

arxiv: v1 [stat.ot] 7 Jul 2010

arxiv: v1 [stat.ot] 7 Jul 2010 Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 8675. arxiv:007.094v

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Some Proofs: This section provides proofs of some theoretical results in section 3.

Some Proofs: This section provides proofs of some theoretical results in section 3. Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Selecting an Optimal Rejection Region for Multiple Testing

Selecting an Optimal Rejection Region for Multiple Testing Selecting an Optial Rejection Region for Multiple Testing A decision theory alternative to FDR control, with an application to icroarrays David R. Bickel Office of Biostatistics and Bioinforatics Medical

More information

Selecting an optimal rejection region for multiple testing

Selecting an optimal rejection region for multiple testing Selecting an optial rejection region for ultiple testing A decision-theoretic alternative to FDR control, with an application to icroarrays David R. Bickel Office of Biostatistics and Bioinforatics Medical

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

FDR- and FWE-controlling methods using data-driven weights

FDR- and FWE-controlling methods using data-driven weights FDR- and FWE-controlling ethods using data-driven weights LIVIO FINOS Center for Modelling Coputing and Statistics, University of Ferrara via N.Machiavelli 35, 44 FERRARA - Italy livio.finos@unife.it LUIGI

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period An Approxiate Model for the Theoretical Prediction of the Velocity... 77 Central European Journal of Energetic Materials, 205, 2(), 77-88 ISSN 2353-843 An Approxiate Model for the Theoretical Prediction

More information

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5 Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students

More information

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Bayesian Approach for Fatigue Life Prediction from Field Inspection Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Asymptotics of weighted random sums

Asymptotics of weighted random sums Asyptotics of weighted rando sus José Manuel Corcuera, David Nualart, Mark Podolskij arxiv:402.44v [ath.pr] 6 Feb 204 February 7, 204 Abstract In this paper we study the asyptotic behaviour of weighted

More information

Meta-Analytic Interval Estimation for Bivariate Correlations

Meta-Analytic Interval Estimation for Bivariate Correlations Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING. RESEARCH REPORT. Christophe A.N. Biscio and Jesper Møller

CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING.   RESEARCH REPORT. Christophe A.N. Biscio and Jesper Møller CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING www.csgb.dk RESEARCH REPORT 2016 Christophe A.N. Biscio and Jesper Møller The accuulated persistence function, a new useful functional suary statistic

More information

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes Nonlinear Log-Periodogra Regression for Perturbed Fractional Processes Yixiao Sun Departent of Econoics Yale University Peter C. B. Phillips Cowles Foundation for Research in Econoics Yale University First

More information

Sampling How Big a Sample?

Sampling How Big a Sample? C. G. G. Aitken, 1 Ph.D. Sapling How Big a Saple? REFERENCE: Aitken CGG. Sapling how big a saple? J Forensic Sci 1999;44(4):750 760. ABSTRACT: It is thought that, in a consignent of discrete units, a certain

More information

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,

More information

arxiv: v2 [math.st] 11 Dec 2018

arxiv: v2 [math.st] 11 Dec 2018 esting for high-diensional network paraeters in auto-regressive odels arxiv:803659v [aths] Dec 08 Lili Zheng and Garvesh Raskutti Abstract High-diensional auto-regressive odels provide a natural way to

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Optical Properties of Plasmas of High-Z Elements

Optical Properties of Plasmas of High-Z Elements Forschungszentru Karlsruhe Techni und Uwelt Wissenschaftlishe Berichte FZK Optical Properties of Plasas of High-Z Eleents V.Tolach 1, G.Miloshevsy 1, H.Würz Project Kernfusion 1 Heat and Mass Transfer

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests StatsM254 Statistical Methods in Coputational Biology Lecture 3-04/08/204 Multiple Testing Issues & K-Means Clustering Lecturer: Jingyi Jessica Li Scribe: Arturo Rairez Multiple Testing Issues When trying

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models Optial Jackknife for Discrete Tie and Continuous Tie Unit Root Models Ye Chen and Jun Yu Singapore Manageent University January 6, Abstract Maxiu likelihood estiation of the persistence paraeter in the

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Statistical Logic Cell Delay Analysis Using a Current-based Model

Statistical Logic Cell Delay Analysis Using a Current-based Model Statistical Logic Cell Delay Analysis Using a Current-based Model Hanif Fatei Shahin Nazarian Massoud Pedra Dept. of EE-Systes, University of Southern California, Los Angeles, CA 90089 {fatei, shahin,

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

A New Approach to Sequential Stopping for Stochastic Simulation

A New Approach to Sequential Stopping for Stochastic Simulation A New Approach to Sequential Stopping for Stochastic Siulation Jing Dong Northwestern University, Evanston, IL, 60208, jing.dong@northwestern.edu Peter W. Glynn Stanford University, Stanford, CA, 94305

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A NEW ROBUST AND EFFICIENT ESTIMATOR FOR ILL-CONDITIONED LINEAR INVERSE PROBLEMS WITH OUTLIERS

A NEW ROBUST AND EFFICIENT ESTIMATOR FOR ILL-CONDITIONED LINEAR INVERSE PROBLEMS WITH OUTLIERS A NEW ROBUST AND EFFICIENT ESTIMATOR FOR ILL-CONDITIONED LINEAR INVERSE PROBLEMS WITH OUTLIERS Marta Martinez-Caara 1, Michael Mua 2, Abdelhak M. Zoubir 2, Martin Vetterli 1 1 School of Coputer and Counication

More information

Kinetic Theory of Gases: Elementary Ideas

Kinetic Theory of Gases: Elementary Ideas Kinetic Theory of Gases: Eleentary Ideas 17th February 2010 1 Kinetic Theory: A Discussion Based on a Siplified iew of the Motion of Gases 1.1 Pressure: Consul Engel and Reid Ch. 33.1) for a discussion

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Weighted Hypothesis Testing. April 7, 2006

Weighted Hypothesis Testing. April 7, 2006 Weighted Hypothesis Testing Larry Wasseran and Kathryn Roeder 1 Carnegie Mellon University April 7, 2006 The power of ultiple testing procedures can be increased by using weighted p-values Genovese, Roeder

More information

arxiv: v3 [quant-ph] 18 Oct 2017

arxiv: v3 [quant-ph] 18 Oct 2017 Self-guaranteed easureent-based quantu coputation Masahito Hayashi 1,, and Michal Hajdušek, 1 Graduate School of Matheatics, Nagoya University, Furocho, Chikusa-ku, Nagoya 464-860, Japan Centre for Quantu

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Decentralized Adaptive Control of Nonlinear Systems Using Radial Basis Neural Networks

Decentralized Adaptive Control of Nonlinear Systems Using Radial Basis Neural Networks 050 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 44, NO., NOVEMBER 999 Decentralized Adaptive Control of Nonlinear Systes Using Radial Basis Neural Networks Jeffrey T. Spooner and Kevin M. Passino Abstract

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

Best Procedures For Sample-Free Item Analysis

Best Procedures For Sample-Free Item Analysis Best Procedures For Saple-Free Ite Analysis Benjain D. Wright University of Chicago Graha A. Douglas University of Western Australia Wright s (1969) widely used "unconditional" procedure for Rasch saple-free

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

ON LEAST FAVORABLE CONFIGURATIONS FOR STEP-UP-DOWN TESTS

ON LEAST FAVORABLE CONFIGURATIONS FOR STEP-UP-DOWN TESTS Statistica Sinica 24 (2014), 1-23 doi:http://dx.doi.org/10.5705/ss.2011.205 ON LEAST FAVORABLE CONFIGURATIONS FOR STEP-UP-DOWN TESTS Gilles Blanchard 1, Thorsten Dickhaus 2, Étienne Roquain3 and Fanny

More information