Weighted Hypothesis Testing. April 7, 2006

Size: px
Start display at page:

Download "Weighted Hypothesis Testing. April 7, 2006"

Transcription

1 Weighted Hypothesis Testing Larry Wasseran and Kathryn Roeder 1 Carnegie Mellon University April 7, 2006 The power of ultiple testing procedures can be increased by using weighted p-values Genovese, Roeder and Wasseran We derive the optial weights and we show that the power is rearkably robust to isspecification of these weights. We consider two ethods for choosing weights in practice. The first, external weighting, is based on prior inforation. The second, estiated weighting, uses the data to choose weights. 1 Introduction The power of ultiple testing procedures can be increased by using weighted p-values Genovese, Roeder and Wasseran Dividing each p-value P by a weight w increases the probability of reecting soe hypotheses. Provided the weights have ean one, failywise error control ethods and false discovery control ethods aintain their frequentist error control guarantees. The first such weighting schee appears to be Hol Related ideas are in enaini and Hochberg 1997 and Chen et al There are, of course, other ways to iprove power aside fro weighting. oe notable recent approaches include Rubin, van der Laan and Dudoit 2005, torey 2005, Donoho and Jin 2004 and ignoravitch Of these, our approach is closest to Rubin, van der Laan and Dudoit 2005, hereafter, RVD. In fact, the optial weights derived here, if re-expressed as cutoffs for test statistics, turn out to be identical to the cutoffs derived in RVD. Our ain contributions beyond RVD are i a careful study of potential power losses due to departures fro the optial weights, ii robustness properties of weighted ethods, and iii recovering power after using data splitting to estiate the weights. An iportant distinction between this paper and RVD versus torey 2005 is that torey uses a slightly different loss function and he requires a coon cutoff for all test statistics. This allows hi to ake an elegant connection with the Neyan-Pearson lea. In particular, his ethod autoatically adapts fro one-sided testing to two-sided testing depending on the configuration of eans. ignoravitch 2006 uses invariance arguents to find powerful test statistics for ultiple testing when the underlying tests are ultivariate. In this paper we show that the optial weights for a one paraeter faily. We also show the power is very robust to isspecification of the weights. In particular, we show that i sparse weights a few large weights and iniu weight close to 1 lead to huge power gain for well 1 Research supported by National Institute of Mental Health grants MH057881,MH066278, MH06329 and NF Grant AT The authors thank Jaie Robins for helping us to clarify several issues. 1

2 power Figure 1: Power gain/loss for weighting a single hypothesis. In this exaple, an unweighted hypothesis has power 1/2. The weights are w 0 < 1 < w 1 with w 1 /w 0 =. The top line shows the power when the alternative is given the correct weight w 1. The botto line, which is nearly indistinguishable fro 1/2, shows the power when the alternative is given the incorrect weight w 0. As increases, the power gain increases sharply while the power loss reains nearly constant. specified weights, but inute power loss for poorly specified weights; and ii in the non-sparse case, under weak conditions, the worst case power loss for poorly specified weights is typically better than the power using equal weights. In fact, the power is degraded at ost by a factor of about γ/1 a where a is the fraction of nonnulls and γ is the fraction of nulls that are istaken for alternatives. Figure 1 shows the sparse case. The top line shows power fro correct weighting while the botto line shows power fro incorrect weighting. We see that the power gains overwhel the potential power loss. Figure 2 shows the non-sparse case. The plots on the left show the power as a function of the alternative ean. The dark solid line shows the lowest possible power assuing the weights were estiated as poorly as possible. The lighter solid line is the power of the unweighted onferroni ethod. The dotted line shows the power under theoretically optial weights. The worst case weighted power is typically close to or larger then the onferroni power except for large when they are both large. We consider two ethods for choosing the weights: i external weights, where prior inforation based on scientific knowledge or prior data singles out specific hypotheses and ii estiated weights where the data are used to construct weights. External weights are prone to bias while estiated weights are prone to variability. The two robustness properties reduce concerns about bias and variance. An exaple of external weighting is the following. We have test statistics {T : = 1,..., } associated with spatial locations {s : = 1,..., } where s [0, L], say. These could be association tests for arkers on a genoe. The nuber of tests is large, on the order of 100,000 2

3 Power Worst Case Power Worst case Power u Figure 2: Power as a function of the alternative ean. In these plots, a =.01, = 1000 and α = There are 1 a nulls and a alternatives with ean. The left plots shows what happens when the weights are incorrectly coputed assuing that a fraction γ of nulls are actually alternatives with ean u. In the top plot, we restrict 0 < u <. In the second and third plot, no restriction is placed on u. The top and iddle plot have γ =.1 while the third plot has γ = 1 a all nulls isspecified as alternatives. The dark solid line shows the lowest possible power assuing the weights were estiated as poorly as possible. The lighter solid line is the power of the unweighted onferroni ethod. The dotted line is the power under the optial weights. The vertical line in the top plot is at. The weighted ethod beats unweighted for al <. The right plot shows the least favorable u as a function of. That is, istaking γ nulls for alternatives with ean u leads to the worst power. Also shown is the line u =. 3

4 for exaple. Each T is used to test the null hypothesis that θ = ET = 0. Prior data is in the for of a sooth stochastic process {Zs : s [0, L]}. This ight be fro a whole genoe linkage scan. At alternatives, the ean µs = EZs is a large positive value; however, due to correlation, at nulls close to alternatives, µs is also non-zero. Peaks in the process Zs provide approxiate inforation about the location of alternatives. We want to use the process Z to generate reasonable weights for the test statistics. When external weights are not available, the optial weights can be estiated fro the data. One approach is to use data splitting RVD using a fraction of the data to estiate the weights and the reainder to test. For exaple, consider the two-stage genoe-wide association study e.g., Thoas et al for which a saple of n subects is split into two subsets. Using the first subset, we obtain test statistics {T : = 1,..., } associated with locations {s : = 1,..., }. Typically only the second subset of data are used in the final analysis. uilding on the ideas of kol et al. 2006, we take the two-stage study design further, exploring how the first set of data can be utilized to forulate weights, and the full data set can be used for testing. 2 Weighted Multiple Testing We are given hypotheses H = H 1,..., H and standardized test statistics T = T 1,..., T where T N, 1. The ethods can be extended for nonnoral test statistics but we do not consider that case here. For a two-sided hypothesis, H = 1 if 0 and H = 0 otherwise. For the sake of parsiony, unless otherwise noted, results will be stated for a one-sided test where H = 1 if > 0 although the results extend easily to the two-sided case. Let θ = 1,..., denote the vector of eans. The original data are often of the for X 11 X X 1 X 21 X X 2 X =.... X n1 X n2... X n where the th test statistic T is based on the th colun of X. Usually, T is of the for T = n X /σ where X is approxiately or exactly Nγ, σ 2/n and the noncentrality paraeter is = n γ /σ. The p-values associated with the tests are P = P 1,..., P where P = ΦT, Φ = 1 Φ and Φ denotes the standard Noral CDF. Let 1 P 1 P 4

5 denote the sorted p-values and let denote the sorted test statistics. T 1 T A reection set R is a subset of {1,..., }. ay that R controls failywise error at level α if PR H 0 α where H 0 = { : H = 0}. The onferroni reection set is where we use the notation z β = Φ 1 β. R = { : P < α/} = { : T > z α/ } 2 The weighted onferroni procedure of Genovese, Roeder and Wasseran 2005 is as follows. pecify nonnegative weights w = w 1,..., w and reect hypothesis H if R = { : P α }. 3 w As long as 1 w = 1, the reection set R controls failywise error at level α. For copleteness, we provide the proof. All further proofs are in the appendix. Lea 2.1 If 1 w = 1, then the reection set R controls failywise error at level α. Proof. The failywise error is PR H 0 > 0 = P P αw for soe H 0 P αw = α w αw = α. H 0 H 0 P Genovese, Roeder and Wasseran 2005 also showed that false discovery ethods benefit by weighting. Recall that the false discovery proportion FDP is FDP = nuber of false reections nuber of reections = R H 0 R 4 where the ratio is defined to be 0 if the denoinator is 0. The false discovery rate FDR is FDR = EFDP. enaini and Hochberg 1995 proved FDR α if R = { : P T } where T = ax{ : P α/}. Genovese, Roeder and Wasseran 2004 showed that FDR α if the P s are replaced by Q = P /w as long as 1 w = 1 as before. This paper will focus only on failywise error. iilar results hold for FDR and will be in a followup paper. 5

6 3 Power and Optiality 3.1 Power efore weighting, that is using weight 1, the power of a single, one-sided alternative is The power 2 in the weighted case is π, w = P P < αw π, 1 = PT > z α/ = Φz α/. 5 = P T > Φ 1 αw = Φ Φ 1 zαw /. 6 Weighting increases the power when w > 1 and decreases the power when w < 1. Given θ = 1,..., and w = w 1,..., w we define the average power 1 π, w I > 0. 7 =1 More generally, if is drawn fro a distribution Q and w = w is a weight function we define the average power π, wi > 0dQ. 8 If we take Q to be the epirical distribution of 1,..., then this reduces to the previous expression. In this case we require w 0 and wdq = Optiality and Robustness In the following theore we see that the set of optial weight functions for a one paraeter faily indexed by a constant c. Theore 3.1 Given θ = 1,...,, the optial weight vector w = w 1,..., w that axiizes the average power subect to w 0 and 1 =1 w = 1 is w = ρ c 1,..., ρ c where ρ c = and c cθ is defined by the condition 2 For a two-sided alternative the power is π, w = Φ Φ α 2 + c 1 I > 0, 9 ρ c = =1 Φ 1 αw 2 + Φ Φ 1 αw

7 ρ ρ c = c = 1 ρ ρ c = c = 10 Figure 3: Optial weight function ρ c for various c. In each case = 1000 and α = The functions are noralized to have axiu 1. The proof is in the appendix. oe plots of the function ρ c for various values of c are shown in Figure 3. In these plots, the function is noralized to have axiu 1 for easier visualization. The result generalizes to the case where the alternative eans are rando variables with distribution Q in which case c is defined by ρ c dq = Reark. Reecting when P /w α/ is the sae as reection when Z > /2 + c/. This is identical to the result of Rubin, van der Laan and Dudoit 2005, obtained independently. The reainder of the paper, which shows soe good properties of the weighted ethod, can thus also be considered as providing support for their ethod. In particular, they noted in their siulations then even poorly specified estiates of the cutoffs /2 + c/ can still perfor well. This paper provides insight into why that is true. Fro 6 and 9 we have iediately: Lea 3.2 The power at an alternative with ean under optial weights is Φ c/ /2. The average power under optial weights, which we call the oracle power, is 1 =1 c Φ I >

8 The oracle power is not attainable since the optial weights depend on θ = 1,..., or, equivalently, on Q. In practice, the weights will either be chosen by prior inforation or by estiating the s. This raises the following question: how sensitive is the power to correct specification of the weights? Now we show that the power is very robust to weight isspecification. The weights theselves can be very sensitive to changes in θ. Consider the following exaple. uppose that θ = 1,..., where each is equal to either 0 or soe fixed nuber. The epirical distribution of the s is thus Q = 1 aδ 0 + aδ where δ denotes a point ass and a is the fraction of nonzero eans. The optial weights are 0 for = 0 and 1/a for =. Let Q = 1 a γδ 0 + γδ u + aδ where u is a sall positive nuber. ince we have only oved the ass at 0 to u, and u is sall, we would hope that w will not change uch. ut this is not the case. et where = A + A 2 2c, u = 2 2c 13 A = Φ 1 α, = Φ 1 γk + a Kα γk + a, 14 yields weights w 0 and w 1 on u and such that w 0 /w 1 = K. For exaple, take = 1000, α = 0.05, a =.1, γ =.1, K = 1000, and c =.1. Then u =.03 and = 9.8. The optial weight on under Q is 10 but under Q it is and so is reduced by a factor of More generally we have the following result which shows that the weights are, in a certain sense, a discontinuous function of θ. Lea 3.3 Fix α and. For any δ > 0 and ɛ > 0 there exists Q = 1 aδ 0 + aδ and Q = 1 a γδ 0 + γδ u + aδ such that dq, Q < δ, and ρ ρ < ɛ 15 where a = α/4, dq, Q = sup Q, ], Q, ] is the Kologorov-nirnov distance, ρ is the optial weight function for Q and ρ is the optial weight function for Q. Fortunately, this proble is not serious since it is possible to have high power even with poor weights. In fact, the power of the weighted ethod has the following two robustness properties: Property I: parse weights iniu weight close to 1 are highly robust. If ost weights are less than 1 and the iniu weight is close to 1 then correct specification large weights on alternatives leads to large power gains but incorrect specification large weights on nulls leads to little power loss. 8

9 Property II: Worst case analysis. Weighted hypothesis testing, even with poorly chosen weights, typically does as well or better than onferroni except when the the alternative eans are large, in which both have high power. Let us now ake the these stateents precise. Also, see Genovese, Roeder and Wasseran 2006 and Roeder, acanu, Wasseran and Devlin 2006 for other results on the effect of weight isspecification. Property I. Consider first the case where the weights take two distinct values and the alternatives have a coon ean. Let ɛ denote the fraction of hypotheses given the larger of the two values of the weights. Then, the weight vector w is proportional to,...,, 1,..., 1 }{{}}{{} k ters k ters where k = ɛ and > 1 and hence the noralized weights are where w 1 = w = w 1,..., w }{{} 1, w 0,..., w 0 }{{} k ters k ters ɛ + 1 ɛ, w 0 = 1 ɛ + 1 ɛ. We say that the weights are sparse if ɛ is sall, that is, if ost weights are near 1. Consider an alternative with ean. The power gain by correct weighting is the power under weight w 1 inus the unweighted power π, w 1 π, 1. iilarly, the power loss for incorrect weighting is π, 1 π, w 0. The gain inus the loss, which we call the robustness function, is R, ɛ π, w 1 π, 1 + π, 1 π, w 0 16 = Φ z αw1 / + Φ z αw0 / 2Φ z α/. 17 The gain outweighs the loss if and only if R, ɛ > 0. This is illustrated in figures 1 and 4. Theore 3.4 Fix > 1. Then, li ɛ 0 R, ɛ > 0. Moreover, there exists ɛ > 0 such that R, ɛ > 0 for all ɛ < ɛ. We can generalize this beyond the two-valued case as follows. Let w be any weight vector such that 1 w = 1. Now define the worst case robustness function R in {π, w π, 1} ax {π, 1 π, w }. 18 {: w >1,H =1} {: w <1,H =1} 9

10 Robustness ε = ε = 0.01 ε = Figure 4: Robustness function for = In this exaple, = z α/ which has power 1/2 without weighting. The gain of correct weighting far outweighs the loss for incorrect weighting as long as the fraction of large weights ɛ is sall. We will see that R > 0 under weak conditions and that the axial robustness is obtained for near the onferroni cutoff z α/. Theore 3.5 A necessary and sufficient condition for R > 0 is R b, Φ z α/ + Φ z αb/ 2Φ z α/ 0 19 where = in{w : w > 1}, b = in{w }. Moreover, R b, = + O1 b 20 where = Φ z α/ Φ z α/ > 0 21 and, as b 1, µ{ : R < 0} 0 and inf R 0. The theore is illustrated in Figure 5. We see that there is overwheling robustness as long as the iniu weight is near 1. Even in the extree case b = 0, there is still a safe zone, an interval of values of over which R > 0. Lea 3.6 uppose that 2. Then there exists > 0 such that R,b > 0 for all 0 and all b. An upper bound on is z α/ 1/z α/ z α/. 10

11 R R b = b = 0.67 R R b = b = 0 Figure 5: The robustness function R for several values of b = in w. In each case, = 1000, α = 0.05, = 10. Whenever R > 0, power gain outweighs power loss. When b is near 1, R > 0 for ost. Even when b = 0 there is a safe zone including = 0 as long as 2. 11

12 Property II. Even if the weights are not sparse, the power of the weighted test cannot be too bad as we now show. To begin, assue that each ean is either equal to 0 or for soe fixed > 0. Thus, the epirical distribution is Q = 1 aδ 0 + aδ 22 where δ denotes a point ass and a is the fraction of nonzero s. The optial weights are 1/a for hypotheses whose ean is. To study the effect of isspecification error, consider the case where b = γ nulls are istaken for alternatives with ean u > 0. This corresponds to isspecifying Q to be Q = 1 a γδ 0 + γδ u + aδ. 23 We will study the effect of varying u so let πu denote the power at the true alternative as a function of u. Also, let π onf denote the power using equal weights onferroni. Note that changing Q = 1 aδ 0 + aδ to Q = 1 aδ 0 + aδ for does not change the weights. As the weights are a function of c, we first need to find c as a function of u. The noralization condition 10 reduces to u γφ 2 u + c + aφ 2 + c = α which iplicitly defines the function cu. First we consider what happens when u is restricted to be less than. Theore 3.7 Assue that α/ γ + a 1. Let Q = 1 aδ 0 + aδ and Q = 1 a γδ 0 + γδ u + aδ with 0 u. Let C = sup 0 u cu and define 0 = z α/γ+a, For 0, C = 0 2 /2. 25 For > 0, C is the solution to γφ 2c + aφ In this case, C = zα/γ 2 /2 + Oa. c + = α Let For <, = z α/ + zα/ 2 α1 a z2 q, where q =. 27 γ inf πu π onf. 28 0<u< 12

13 For we have inf 0<u< πu Φ z 2 α/γ Φ 2 log 1 a γ Oa 29 Oa 30 1 γ Oa a The factor Φ 2 log 1 a γ/1 a is the worst case power deficit due to isspecification. γ Now we drop the assuption that u. Theore 3.8 Let Q = 1 aδ 0 + aδ and let Q u 1 a γδ 0 + γδ u + aδ. Let π u denote the power at using the weights coputed under Q u. 1. The least favorable u is u argin u 0 π u = 2c = z α/γ + Oa 32 where c solves and c = zα/γ 2 /2 + Oa. γφ 2c + aφ 2 + c = α The inial power is inf u π c u = Φ = Φ 2 z 2 α/γ 2 + Oa A sufficient condition for inf u π u to be larger than the power of the onferroni ethod is z α/ + zα/ 2 z2 α/γ + Oa Choosing External Weights In choosing external weights, we will focus here on the two-valued case. Thus, w = w 1,..., w }{{} 1, w 0,..., w 0 36 }{{} k ters k ters where k = ɛ, w 1 = /ɛ +1 ɛ and w 0 = 1/ɛ +1 ɛ. In practice, we would typically have a fixed fraction of hypotheses ɛ that we want to give ore weight to. The question is how 13

14 ε R, Figure 6: Top plot: 0 ɛ versus ɛ. otto plot shows R,.1 versus. The turnaround point ɛ is shown with a vertical dotted line. to choose. We will focus on choosing to produce weights with good properties at interesting values of. Now large values of already have high power. Very sall values of have extreely low power and benefit little by weighting. This leads us to focus on constructing weights that are useful for a arginal effect, defined as the alternative that has power 1/2 when given weight 1. Thus, the arginal effect is = z α/. In the rest of this section then we assue that all nonzero s are equal to. Of course, the validity of the procedure does not depend on this assuption being true. Fix 0 < ɛ < 1 and vary. As we increase, we will eventually reach a point 0 ɛ where R, ɛ < 0 which we call turnaround point. Forally, { } 0 ɛ = sup : R, ɛ > The top panel in Figure 6 shows 0 ɛ versus ɛ which shows that for sall ɛ we can choose large without loss of power. The botto panel shows R, ɛ for ɛ = 0.1. We suggest using = ɛ, the value of that axiizes R, ɛ. Theore 4.1 Fix 0 < ɛ < 1. As a function of, R, ɛ is uniodal and satisfies R1, ɛ = 1, R 1, ɛ > 0 and R, ɛ < 0. Hence, 0 ɛ exists and is unique. Also, R, ɛ has a unique axiu at soe point ɛ and R ɛ, ɛ > 0. 14

15 When ɛ is very sall, we can essentially choose as large as we like, For exaple, suppose we want to increase the chance of reecting one particular hypothesis so that ɛ = 1/. Then, and ee Figure 1. w 1 = + 1, w 0 = li li π, w 1 = 1, while li li π, w 0 = 1 2. The next results show that binary weighting schees are optial in a certain sense. uppose we want to have at least a fraction ɛ with high power 1 β and otherwise we want to axiize the iniu power. Theore 4.2 Consider the following optiization proble: Given 0 < ɛ < 1 and 0 < β < 1/2, find a vector w = w 1,..., w that axiizes subect to The solution is given by in π, w w = 1, and #{ : πw, 1 β} ɛ. w = w 1,..., w }{{} 1, w 0,..., w 0 38 }{{} k ters k ters where w 1 = /ɛ + 1 ɛ, w 0 = 1/ɛ + 1 ɛ, k = ɛ, = c1 ɛ/α ɛc and c = Φ z α/ + z 1 β. If our goal is to axiize the nuber of alternatives with high power while aintaining a iniu power loss, the solution is given as follows. Theore 4.3 Consider the following optiization proble: Given 0 < β < 1/2, find a vector w = w 1,..., w that axiizes #{ : πw, 1 β} 39 subect to The solution is w = 1, and in πw, δ. 40 w = w 1,..., w }{{} 1, w 0,..., w 0 41 }{{} k ters k ters 15

16 where and k = ɛ. w 1 = α Φ z α/ + z 1 β, w0 = α Φ z α/ + z δ, ɛ = 1 w 0 w 1 w 0 42 A special case that falls under this Theore perits the iniu power to be 0. In this case w 0 = 0 and ɛ = 1/w 1. 5 Estiated Weights In this section we explain how to use the data to estiate the weights. There are two issues: we ust ensure that the error is still controlled and avoid incurring large losses of power due to replacing θ with an estiator θ. 5.1 Validity With Estiated Weights Data plitting. The approach, taken by RVD, for ensuring that the error control is preserved relies on data splitting. This approach relies on noralized test statistics T l, T 2 based on a partition of the data into subsets X 1, X 2 which include fractions b and 1 b of X, respectively. Note that T = b 1/2 T b 1/2 T 2, = 1,...,. The training data X 1 is used to estiate the noncentrality paraeter of the standardized statistic T 1, where E[T 1 ] = b 1. Testing is conducted using the reaining fraction of the data X 2 1. Consequently ust be rescaled by r s = 1 b/b 1/2 to estiate the noncentrality paraeter of the standardized statistic T 2, i.e., = r 1 s. The estiated weights are ŵ T 1 = ρ c. ecause of the independence between the two portions of the data, failywise error is controlled at the noinal level. Lea 5.1 The procedure that reects when P T 2 < wt 1 α/ controls the failywise error at level α. Recovering Power. As noted by kol et al. 2005, data splitting incurs a loss of power because the p-values are coputed using only a fraction the data. To recover this lost power, we need to use 1 all the data to copute the p-values. When using this approach ust be rescaled by r f = b 1/2 to estiate the noncentrality paraeter of the standardized statistic T, i.e., = r 1 f. As in the data splitting procedure, the estiated weights ŵ T 1 = ρ c depend only on X 1. To preserve error control we proceed as follows. 16

17 Theore 5.2 Assue T k N0, 1 independently for k = 1, 2. uppose that weight wt 1 depends only on X 1 but the p-value P T is allowed to depend on the full data X. Define ct 1 to solve 1 Φ =1 2 + ct 1 b T 1 1 b = α. 43 Then the procedure that reects when P T < w T 1 α/, where w T 1 = α Φ 2 + ct 1 44 controls the failywise error at level α. 5.2 iulations We siulate a study with = 1000 tests, yielding data of the for given in 1. A test of the hypothesis H 0 : 0 is perfored for each using T, which we assue is approxiately norally distributed, or equivalently T 2 χ2 1. In our siulations we generate 50 of the 1000 tests under the alternative hypothesis with shift paraeter = 2, 3, 4 or 5. We copare the power for various levels of a threshold paraeter λ 0,.5, 1, 1.5, 2, 2.5. We use a fraction b = 0.5 of the data to construct the weights and we copare four ethods for estiating the noncentrality paraeter: 1. The noralized statistic 1 = T Hard thresholding: 3. oft thresholding: 4. The Jaes-tein estiator 1 = T 1 I T 1 > λ = signt 1 T 1 λ = 1 2 i T 1 i 2 + T To copute the weights we rescale strategy. 1 by r s or r f as appropriate to the followup testing 17

18 Power results are displayed in Fig. 7. We first consider the power of the RVD procedure which uses the data splitting strategy and λ = 0 Fig. 7, labeled P at the origin. Although RVD suggest using λ = 0, we also exaine the power of this procedure for a range of values of λ. This extended RVD procedure is applying hard-thresholding to estiate θ. Next we consider the power of four testing strategies that use the full data T for testing rather than data splitting. The first approach uses binary weights equal to /M where M = 1 i I{ T > λ}. In this setting, when λ = 0 the ethod reduces to the siple one-stage onferroni approach. For λ > 0 it is the ethod of kol et al The reaining three approaches rely on weights estiated using hard-thresholding H, soft-thresholding, or Jaes-tein J. For λ = 0 ethods H and reduce to the nored saple ean which is the RVD approach adapted to incorporate the full data in the p-value. Clearly, this adaptation of the RVD ethod leads to a valuable increase in power. For λ > 0 this is ethod iposes a hard threshold shrinkage effect on the paraeter estiates. Notice that for any fixed value of λ, ethod H gives the best power. In particular, the difference in power between ethods and H illustrates the advantage of using variable weights estiated fro a fraction of the data. Method H and to a lesser extent ethod are nearly invariant to λ for oderate values of the threshold paraeter. In contrast, ethod, relying on soft-thresholding, experiences a sharp decline in power as λ increases. Finally, the Jaes-tein approach clearly fails in this setting, presuably because ost tests follow the null hypothesis and hence the true signals are shrunk toward 0 which diinishes the power of the procedure. For each condition investigated the tests had size less than 0.05 as expected fro the theory. The Jaes-tein ethod was ost conservative. Fro this experient it appears that shrinkage only enhances power when the signal is very weak. A ore careful analysis reveals that the effect of shrinkage for stronger signals is ore subtle. As 0, ρ c 0. Figure 8 shows ρ c is close to zero for a broad range of values. Consequently, for λ 1.5, the weight function perfors alost the sae role as the threshold paraeter. Using hard-thresholding for λ < 1.5 is essentially equivalent to using using no threshold because a oderate level of shrinkage is autoatically iposed by the weight function. Figure 8 also illustrates how the optial weights vary with the signal strength top panel has greater signal than botto panel. oth panels indicate that larger weights are placed in the idrange of signal strength. Essentially no weight is wasted on tests with sall signals < 1.5 because these tests are not likely to yield significant results. The botto panel shows that large weights are also not wasted on signals so strong that the tests can easily be reected even without up-weighting > 6. The top panel places its largest weights between 2.5 and 4. The botto panel has fewer signals in this range and hence stronger weights can be applied to signals between 2 and 2.5. oth panels indicate near 0 weights would be applied to tests with signals near 0. 18

19 power H H H H H H H P P P P P P P J J J J J J J power H H H H H H P P P P P P H P J J J J J J J λ λ power H H H H H P P P P P H H P J J J J J J J P power H H H H H H P P P J J J P J PJ PJ H J P λ λ Figure 7: Power of weighted tests. Fro top left clockwise: = 2, 3, 4, 5. Methods copared use weights based on hard thresholding H, soft thresholding, binary weights, and Jaes-tein J. 19

20 Weight θ Weight θ 6 Discussion Figure 8: Distribution of weights for two sets of data. An interesting connection can be ade between weights based on threshold-estiators and twostage experiental designs that perfor only a subset of the tests in stage two, based on the results obtained fro stage one. The siplest exaple of this type of two-stage testing is the two-stage onferroni procedure, for which the training data X 1 is used to deterine the M eleents in Λ = { : T 1 > λ}; X 2 is only easured for these coluns. A onferroni correction with α/2m controls FWER at level α for two-sided testing in this setting. In essence this approach is a weighted test with weights equal to /M for the eleents in Λ and zero else where. While the classic two-stage approach uses X 1 for training, and X 2 for testing, an alternative is to use the training data to deterine the weights and then use all of the data to conduct the tests. This strategy was recently investigated by kol et al. 2006, using constant weights. These authors use the training data to deterine Λ and then apply weights equal to /M to the M tests deterined in stage one. This full data approach proved to be considerably ore powerful than the two-stage onferroni approach in siulations. For hard and soft-thresholding, = 0 for any T 1 < λ. Fro 9 it follows that the weights for any test with = 0 are 0 and the reection region is Z 0 =. Hence, a procedure using w = 0 for coluns with = 0 is equivalent to a truncation procedure that tests only those coluns in 20

21 Λ. In practice, λ can be chosen to optiize power or to constrain the experiental budget. It is worth noting that in soe experiental settings, such as those described by kol et al., this experiental design can lead to considerable savings of effort and resources. Our results suggest that this savings can be gleaned without loosing easurable power. The sae ideas used here can be applied to other testing ethods to iprove power. In particular, weights can be added to the FDR ethod, Hol s stepdown test, and the Donoho-Jin 2004 ethod. Weighting ideas can also be used for confidence intervals. We plan to present the details for the other ethods in a followup paper. Another ite to be addressed in future work is the connection with ayesian ethods. As we noted, using weights is equivalent to using a separate reection cutoff for each statistic. The ethods of torey 2005 and ignoravich 2006 find optial cutoffs when the cutoffs are constrained. There is undoubtedly a bias-variance tradeoff. These constrained ethods can estiate optial cutoffs well low variance but they will not achieve the oracle power obtained here since they are by design biased away fro these separate cutoffs. Future work should be directed at coparing these approaches and developing ethods that lie in between these extrees. 7 Appendix Proof of Theore 3.1. Let A denote the set of hypotheses with > 0. Power is optiized if w = 0 for / A. The average power is 1 with constraint Choose w to axiize π = 1 A A Φ Φ 1 αw w =. A Φ Φ 1 αw. λ w i A by setting the derivative to zero w i π = λ + λ α = φ Φ 1 αw φ Φ 1 αw φ Φ 1 αw φ Φ 1 αw α = 0 21

22 The w that solves these equations is given in 9. Finally, solve for c such that i w i =. Proof of Lea 3.3. Choose K > 1 such that 1/K + 1 < 1/a ɛ. Choose 1 > γ > 2α a/k. Choose a sall c > 0. Let = A + A 2 2c and u = 2 2c where A = Φ 1 α, = Φ 1 Kα. 48 γk + a γk + a Then ρ = 1/a and ρ = 1/K + 1. Now dq, Q = γ. Taking K sufficiently large and γ sufficiently close to 2α a/k akes γ < δ. Proof of Theore 3.5. The first stateent follows easily by noting that the worst case corresponds to choosing weight in the first ter in R and choosing weight b in the second ter in R. The rest follows by Taylor expanding R b, around b = 1. Proof of Lea 3.6. With b = 0, R b, 0 when Φz α/ 2Φz α/ With 2, 49 holds at = 0. The left hand side is increasing in for near 0 but 49 does not hold at = z α/. o 49 ust holds in the interval [0, ]. Rewrite 49 as Φz α/ Φz α/ Φz α/. We lower bound the left hand side and upper bound the right hand side. The left hand side is Φz α/ Φz α/ = z α/ z α/ φudu z α/ z α/ φz α/. The right hand side can be bounded using Mill s ratio: Φz α/ φz α/ /z α/. et the lower bound greater than the upper bound to obtain the stated result. It is convenient to prove Theore 3.8 before proving Theore 3.7. Proof of Theore 3.8. Let c solve γφ 2c + aφ 2 + c = α. 50 We clai first that for any c > c, there is no u such that the weights average to 1. Fix c > c. The weights average to 1 if and only if c γφ u + u + aφ c = α

23 ince c > c and since the second ter is decreasing in c, we ust have c Φ u + u > Φ 2c The function ru = Φc/u + u/2 is axiized at u = 2c. o r 2c ru. ut r 2c = Φ 2c. Hence Φ 2c ru Φ 2c. This iplies c < c which is a contradiction. This establishes that sup u cu c. On the other hand, taking c = c and u = 2c solves equation 51. Thus c is indeed the largest c that solves the equation which establishes the first clai. The second clai follows by noting that γφ 2c + aφ Now set this expression equal to α/ and solve. 2 + c = γφ 2c + Oa. 53 Proof of Theore 3.7. Define c as in 50. If u = 2c then the the proof proceeds as in the previous proof. o we first need to establish for which values of is this true. Let rc = γφ 2c + aφ/2 + c/. We want to find out when the solution of rc = α/ is such that 2c, or equivalently, c 2 /2. Now r is decreasing in c. ince γ + a α/, r α/. Hence there is a solution with c 2 /2 if and only if r 2 /2 α/. ut r 2 /2 = γ + aφ so we conclude that there is such a solution if and only if γ + aφ α/, that is, z α/γ+a = 0. Now suppose that < 0. We need to find u to ake c as large as possible in the equation vu, c γφu/2 + c/u + aφ/2 + c/ = α/. Let u = and c = z α/γ+a 2 /2. y direct substitution, vu, c = α/ for this choice of u and c and clearly u as required. We clai that this is the largest possible c. To see this, note that vu, c < vu, c. For 0, vu, c is a decreasing function of u. Hence, vu, c < vu, c vu, c = α/. This contradicts the fact that vu, c = α/. For the second clai, note that the power of the weighted test beats the power of onferroni if and only if the weight w = /αφ/2 + C/2 1 which is equivalent to C z α/ 2 /2. 54 When 0, C = 0 2 /2. y assuption, γ + a 1 so that z α/γ+a z α/ and Now suppose that 0 <. Then C is the solution to rc = γφ 2c + aφ/2 + c/ = α/. We clai that 54 still holds. uppose not. Then, since rc is decreasing in c, rz α/ 2 /2 > rc = α/. ut, by direct calculation, rz α/ 2 /2 > α/ iplies that > which is a contradiction. Thus 28 holds. 23

24 Finally, we turn to 29. In this case, C = zα/γ 2 /2 + Oa. The worst case power is ΦC/ /2 = Φzα/γ 2 /2 /2 + Oa. The latter is increasing in and so is at least Φzα/γ 2 /2 /2 + Oa = Φzα/γ 2 /2 /2 2 + Oa as claied. The next two equations follow fro standard tail approxiations for Gaussians. pecifically, a Gaussian quantile z β/ can be written as z β/ = 2 logl /β where L = c log a for constants a and c Donoho and Jin Inserting this into the previous expression yields the final expression. Proof of Theore 4.2. etting πw, = ΦΦ 1 wα/ equal to 1 β iplies w = /αφz 1 β + z α/ which is equal to w 1 as stated in the theore. The stated for of w 0 iplies that the weights average to 1. The stated solution thus satisfies the restriction that a fraction ɛ have power at least 1 β. Increasing the weight of any hypothesis whose weight is w 0 necessitates reducing the weight of another hypothesis. This either reduces the iniu power of forces a hypothesis with power 1 β to fall below 1 β. Hence, the stated solution does in fact axiize the iniu power. The proof of Theore 4.3 is siilar to the previous proof and is oitted. Proof of Theore 5.2. The failywise error is PR H 0 P P T 1, T 2 w T 1 α H 0 = E P P t 1, T 2 w t 1 α T 1 = t 1 H 0 = E P Φt 1, T 2 w t 1 α T 1 = t 1 H 0 = E P 1 b 1/2 T 2 Φ 1 w t 1 α b 1/2 t 1 H 0 Φ 1 w t 1 = H 0 E Φ α 1 b 1/2 b 1/2 t

25 = H 0 E Φ E Φ =1 E Φ =1 2 + ct1 2 + ct1 2 + ct1 b 1/2 t 1 1 b 1/2 b 1/2 t 1 1 b 1/2 b 1/2 t 1 1 b 1/ = α

26 References enaini, Y. and Hochberg, Y Controlling the false discovery rate: A practical and powerful approach to ultiple testing. Journal of the Royal tatistical ociety, eries, 57, enaini, Y. and Hochberg, Y Multiple Hypothesis Testing with Weights, candinavian Journal of tatistics, 24, Chen, Jaes J. and Lin, Karl K. and Huque, Mohaad and Arani, Rain Weighted p-value adustents for anial carcinogenicity trend test. ioetrics, 56, Donoho, D. and Jin, J Higher criticis for detecting sparse heterogeneous ixtures. The Annals of tatistics, 32, Genovese, C. R., Roeder, K. and Wasseran, L False Discovery Control with P-Value Weighting. To appear: ioetrika. Hol A siple sequentially reective ultiple test procedure. candinavian Journal of tatistics, 6, Roeder, acanu, Wasseran and Devlin Using Linkage Genoe cans to Iprove Power of Association in Genoe cans. The Aerican Journal of Huan Genetics. 78. Rubin, D, van der Laan, M. and Dudoit, Multiple testing procedures which are optial at a siple alternative. Technical report 171, Division of iostatistics, chool of Public Health, University of California, erkeley. kol A.D., cott L.J., Abecasis G.R. and oehnke M Joint analysis is ore efficient than replication-based analysis for two-stage genoe-wide association studies. Nat Genet. 38: ignoravitch, J Optial ultiple testing under the general linear odel. Technical report. Harvard iostatistics. torey J.D The optial discovery procedure: A new approach to siultaneous significance testing. UW iostatistics Working Paper eries, Working Paper 259. Thoas D.C., Haile R.W. and Duggan D Recent developents in genoewide association scans: a workshop suary and review. A J Hu Genet 77:

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Generalized Augmentation for Control of the k-familywise Error Rate

Generalized Augmentation for Control of the k-familywise Error Rate International Journal of Statistics in Medical Research, 2012, 1, 113-119 113 Generalized Augentation for Control of the k-failywise Error Rate Alessio Farcoeni* Departent of Public Health and Infectious

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Sampling How Big a Sample?

Sampling How Big a Sample? C. G. G. Aitken, 1 Ph.D. Sapling How Big a Saple? REFERENCE: Aitken CGG. Sapling how big a saple? J Forensic Sci 1999;44(4):750 760. ABSTRACT: It is thought that, in a consignent of discrete units, a certain

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Meta-Analytic Interval Estimation for Bivariate Correlations

Meta-Analytic Interval Estimation for Bivariate Correlations Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Simultaneous critical values for t-tests in very high dimensions

Simultaneous critical values for t-tests in very high dimensions Bernoulli 17(1, 2011, 347 394 DOI: 10.3150/10-BEJ272 Siultaneous critical values for t-tests in very high diensions HONGYUAN CAO 1 and MICHAEL R. KOSOROK 2 1 Departent of Health Studies, 5841 South Maryland

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

arxiv: v1 [stat.ot] 7 Jul 2010

arxiv: v1 [stat.ot] 7 Jul 2010 Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 8675. arxiv:007.094v

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange. CHAPTER 3 Data Description Objectives Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance,

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany. New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests StatsM254 Statistical Methods in Coputational Biology Lecture 3-04/08/204 Multiple Testing Issues & K-Means Clustering Lecturer: Jingyi Jessica Li Scribe: Arturo Rairez Multiple Testing Issues When trying

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Some Proofs: This section provides proofs of some theoretical results in section 3.

Some Proofs: This section provides proofs of some theoretical results in section 3. Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint. 59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information