FDR- and FWE-controlling methods using data-driven weights

Size: px
Start display at page:

Download "FDR- and FWE-controlling methods using data-driven weights"

Transcription

1 FDR- and FWE-controlling ethods using data-driven weights LIVIO FINOS Center for Modelling Coputing and Statistics, University of Ferrara via N.Machiavelli 35, 44 FERRARA - Italy livio.finos@unife.it LUIGI SALMASO Departent of Manageent and Engineering, University of Padova Str. S. Nicola 3, 36 VICENZA - Italy Abstract Weighted ethods are an iportant feature of ultiplicity control ethods. The weights ust usually be chosen a priori, on the basis of experiental hypotheses. Under soe conditions, however, they can be chosen aking use of inforation fro the data (therefore a posteriori) while aintaining ultiplicity control. In this paper we provide: ) a review of weighted ethods for FWE (both paraetric and nonparaetric) and FDR control; 2) a review of data-driven weighted ethods for FWE control; 3) a new proposal for weighted FDR control (data-driven weights) under independence aong variables; 4) under any type of dependence; 5) a siulation study that assesses the perforance of procedure of point 4 under various conditions. Key words: A priori ordered hypotheses; FDR; FWE; Multiplicity control; Weighted procedures. Introduction In probles dealing with thousands (and soeties hundreds of thousands) of variables, the standard Type I error rate criterion used to evaluate tests becoes less iportant as hundreds of significances ight easily be, as a atter of fact, Type I errors. On the other hand, attepts to rigorously control the failywise type I error rate (FWE) are typically excessively conservative. For exaple, if the Bonferroni ethod is used to control the FWE with tests, then a test will have to be significant at the α/ level for it to be declared real. The two coplaints ost often ade about this ethod are that (i) the procedure s dependence on sees arbitrary, and related to that, (ii) the ethod is exceedingly conservative for large values of. Hol s (979) step-down approach is a slight iproveent on the siple Bonferroni ethod, but gains little in ters of power when is large. The Hol s ethod rejects the hypothesis corresponding to the ost significant

2 (sallest p-value) test if the p-value is less than α/; if this hypothesis is rejected, then the second sallest p-value is copared to α/( ), and so on. It is usually called Bonferroni-Hol Procedure (BHP). Benjaini and Hochberg s false discovery rate (FDR) controlling ethod (995) has been proposed as an alternative to FWE-controlling ethods. To a certain extent, FDR solves probles (i) and (ii) though it ay becoe exceedingly conservative for large. FDR-controlling procedures do not generally control the FWE, thus they allow soe fraction of detected significances to be in error. Weighted ethods are useful when soe H i hypotheses are deeed ore iportant than others - e.g. in clinical trials the various patient end points ight be ranked a priori, and the testing procedure designed to give ore power to the ore iportant hypotheses. The siplest weighted ultiple testing procedure, discussed for exaple in Rosenthal and Rubin (983), is to reject H i if p i w i α, where the weights w i lie in the siplex w i ; Σw i =, and p i is the p-value of the i-th test, i =,...,. The choice of w i ay be based purely on the a priori iportance of the hypotheses or, to optiise power, based on prior inforation (Spjøtvoll, 972; Westfall et al., 2). Giving greater weight to the ore significant hypotheses is generally considered data snooping, and such ethods inflate (ultivariate) type I error rates. However, when properly chosen, weights can be taken fro the concurrent data set so as to iprove power without coproising significance levels. In this paper we consider a class of weighted FWE-controlling procedures and a class of weighted FDRcontrolling procedures which exploit inforation fro the data and, under soe conditions, iprove perforances in ters of power. All proofs of the proposed new theores are given in the Appendix. 2 Weighted FWE and FDR ethods In order to discuss the error rates we would like to control, we get R i = if H i (H i being a true or false null hypothesis) is rejected and otherwise, and we get V i = if a true H i is (erroneously) rejected, and otherwise. The failywise error rate can be defined as FWE = P( V i > ), which is the probability of aking at least one erroneous rejection. The false discovery rate, which is the expected proportion of erroneously rejected hypotheses aong the rejected ones, can be defined as FDR= E( V i / R i ), where V i / R i is defined as when R i =. It is easy to see that FDR FWE. Note also that when all hypotheses are true FDR = FWE (Benjaini and Hochberg, 997; Benjaini and Yekutieli, 2). 2. Weighted FWE ethods When we require FWE control, there are at least two weighted procedures based on Bonferroni s inequality to be considered. The first is Hol s (979) weighted procedure (known as the Weighted Hol s Procedure - WHP), the second, proposed in Benjaini and Hochberg (997), ight be called Weighted Benjaini- 2

3 Hochberg s Procedure (WBHP). Both of the above procedures control the unweighted FWE. The natural weighted extension of FWE (P[ V i > ]) which we call WFWE is P[ w i V i > ]. These procedures control the WFWE as well since the latter is equal to the forer for any set of weights. We shall now briefly describe the two procedures. 2.. Hol s weighted procedure (979) Let Pi = P i /w i and order P()... P (). Let H (i), w (i), correspond to P (i). Reject H(i) for i =,2,3,... as long as P (i) α. k=i w (k) Hol s unweighted procedure is the above procedure with equal weights which all have to be equal to one. If the weighted procedure called for the rejection of H i when P(i) k=i w (k) = P (i) (+ k=i+ w (k) /w (i)) is saller than a constant α, then a larger w (i) iplies greater power for rejecting that hypothesis. The weights w (i) coe into play also in the definition of the ordered vector P(i) ; therefore w (i) affects the probability of rejecting H (i) given that H (j) j < i has been rejected and it also affects the order of the tested hypothesis Benjaini-Hochberg s weighted procedure (997) Benjaini and Hochberg (997) discuss a procedure which controls the WFWE by the ordered P i, P (i)... P (). Let H (i), w (i) ( i= w (i) = ), correspond to P (i). Reject H (i) for i =,2,3,... as long as P (i) w (i) k=i w (k) α. Proof of the FWE control is found in Benjaini and Hochberg (997, theore 2). Again, Hol s unweighted procedure is the sae as the above procedure with equal weights which all have to be equal to one. Note that this procedure respects the natural ordering of p-values (not corrected for ultiplicity). Therefore, w i affects the probability of rejecting H i given that H j j < i has been rejected, but it does not affect the order of the tested hypothesis. 2.2 Weighted FDR ethods For ease of notation assue that the first o hypotheses tested are in fact true and = are false. The false discovery rate is ( i= E(Q) = E V ) i i= R () i 3

4 which is the expected proportion of the falsely rejected hypotheses aong the rejected ones. When weighting is desired, the FDR can be generalized as follows. Let Q(w) be Q(w) = { È i= wivi È i= wiri i= w ir i > otherwise, then the weighted false discovery rate (WFDR) is defined as E(Q(w)). Note that if soe of the weights are, and the others are all equal, then the WFDR is identical to the FDR for the proble restricted to testing hypotheses with positive weights. Under the intersection null hypothesis that all tested hypotheses are true, WFDR = WFWE. It is again easy to show that WFDR WFWE, so a WFDR controlling procedure is potentially ore powerful than a WFWE controlling procedure Weighted FDR ethods (Benjaini and Hochberg, 997) for independent variables Consider the following procedure: Let k be the largest j satisfying (2) j i= P (j) w (i) q, (3) then reject H (),...,H (k) (q being the desired proportion of false rejections). Theore 2. (Benjaini and Hochberg, 997; theore 4) Procedure i M w i (3) always controls the FDR at level q (M being the set of indexes that correspond to true null hypotheses) for independent test statistics. In this procedure, the weights incorporated into the error rate are suitably accuulated to for the procedural weights. It is iportant to reject a hypothesis with a high weight as it considerably increases the weight of the total discoveries. However it also increases the weight of the errors. Essentially we are incorporating the sae weights into the loss fro errors w i V i and the gain fro rejections w i R i Weighted FDR ethods for dependent variables Benjaini and Yekutieli (2) provide a correction constant that guarantees FDR control under every type of dependence between variables. In this section we extend previous results fro the literature to the weighted case for dependent variables. Nothing that this situation is not the ain task of the paper, hence this paragraph is self-contained with respect of the rest of the paper. È Theore 2.2 When procedure (3) is perfored with j= (/Σ h jw h )q taking the place of q, it always controls the FDR at level È i M w i q (M being 4

5 the set of indexes that correspond to true null hypotheses). This result ay increase the range of probles for which a powerful procedure with proven FDR control can be offered. Obviously the adjustent by j= Σ h j w h is very often unnecessary, and yields a procedure that is too conservative. Indeed the control is given at level no correction whatsoever is necessary until ( )È h M w h j= Σ h j w h ( )È h M w h j= Σ h j w h q and. Given that in any real experiental cases the proportion of active variables is low, this condition is usually respected and the correction is not therefore necessary. In the ain theore of the work of Benjaini and Yekutieli (2, theore.2), the authors prove that a procedure without this correction still controls the (unweighted) FDR in failies with Positive Regression Dependency on each eleent fro a Subset M, or PRDS on M. Recall that a set D is called increasing if x D and y x, iplies that y D as well. Property PRDS: For any increasing set D, and for each i M, P(X D X i = x) is non-decreasing in x. Hence, the structure of the dependency assued ay be different for the set of the true hypotheses with respect to the set of the false hypotheses. Background on these concepts is clearly presented in Eaton (986), suppleented by Holland and Rosenbau (986). The PRDS property is a relaxed for of the positive regression dependency property (Sarkar, 969) and of MTP 2 (Sarkar, 998) as well. Hence the PRDS is a very weak condition of dependence between variables which often holds for the real data. The control of the FDR over PRDS variables can be easly extended to the case of weighted FDR procedures as follows Theore 2.3 If the joint distribution of the test statistics is PRDS on the subset M of test statistics corresponding Èto true null hypotheses, the procedure i M w i given by (3) controls the FDR at level q. Proof is straightforward if one adapts the proof given in Benjaini and Yekutieli (2) to the weighted procedure. In the sae way we adapt theore 2.2 fro theore.3 of the sae authors. Hint: define v as the su of weights of true null hypotheses and s as the weights of false ones. In the sae way, it is trivial atter to prove the control for discrete and one-sided test statistics (see theores 5.2 and 5.3 fro Benjaini and Yekutieli, 2). 3 Data-driven weighted procedures We show a way to define data-driven weights that gathers inforation on the distribution of the hypotheses under the null or under the alternative. After- 5

6 wards, we show how these weights allows for FWE or FDR control for paraetric and nonparaetric tests. 3. Data-driven weights We consider paraetric and nonparaetric versions of the one saple and twosaple tests. Extension to C independent saples is straightforward. The paraetric one-saple case is characterized by a saple of n i.i.d. ultivariate noral observation vectors of diension x j = µ = x j. x j N (µ,σ), j =,...,n, (4) µ.. µ,σ = σ... σ σ... σ and we wish to test the partial hypotheses H i : µ i = (i =,...,) under strong control of the failywise type I error. The covariance atrix Σ is arbitrary and positive seidefinite. In a nonparaetric setting, we assue that the n i.i.d. -diensional saple vectors x j (j =,...,n) have a density f(x) which is syetrical to location vector µ = (µ,...,µ ), f(µ + x) = f(µ x). Dependences aong variables are not specified but are generally assued to exist. In the case of two independent saples fro two -diensional noral populations with the sae covariance atrix x jc = x jc. x jc N (µ c,σ),c =,2; j =,...,n c, n = n + n 2, (5) we consider the usual two-saple t statistic in order to test partial null hypotheses H i : µ i = µ 2i (i =,...,). In the nonparaetric approach, we siply assue that under the null hypothesis the two saples have the sae distribution: f(x i ) = f(x i2 ) i =,...,. One way to obtain inforation about the presence of possible alternative hypotheses is given by the use of the total variance calculated on the entire saple without considering the classification into groups. Indeed the total su of squares SST i (i =,...,) is given by su of between groups su of squares 6

7 SSB i and within groups SSW i as follows: SST i = n c (x ijc x i ) 2 = c=,2 j= = n c (x ijc x ic ) 2 + n c (x ic x i ) 2 = c=,2 j= = SSW i + SSB i c=,2 where x ic = n c j= x icj/n c ; c =,2 and x i indicates the overall ean for variable i. The ean of total variance estiated by MST i = SST i /(n ) (i =,...,) is equal to: E(MST i ) = E(SSW i /(n )) + E(SSB i /(n )) = σ ii (n 2)/(n ) + σ ii /(n ) + = σ ii + n/(n )((µ i µ i2 )/2) 2 = σ ii + n/(n )(δ/2) 2 c=,2 n c (µ ic µ i ) 2 /(n ) In this way we highlight that SST i tends to be larger when the variable i is under H. This is especially true when n is sall. In order to understand how the total variance is related to the power of tests with different saple sizes, we keep fixed the value of the t-statistic for increasing n (i.e. the sae p-value when σ ii is known or the asyptotical approxiation to the gaussian distribution holds). Hence δ n = δ / n, being the t-statistic equal to t = δ n n/σii = δ /σii. Now, writing the expected value of MST as a function of n we get: E(MST i ) = σ ii +(δ n /2) 2 n/(n ) = σ ii +(δ/( n2)) 2 n/(n ) = σ ii +(δ/2) 2 /(n ) which is a decreasing function of n. Siilar considerations can be ade in the case of ore than two independent saples defining S i as the total variance of variable i. If we are considering a one-saple test S i becoes S i = n j= x2 ij /n. Note that S i = n j= x2 ij /n = n j= (x ij x i ) 2 /n + x 2 i. Therefore, with this definition we are considering the shift fro zero and the error variability. Using the total variability as for the two saple case, it whold be not sensitive about non null effects. Hence, to define cobining functions which give greater weight to the p- values with large S it is sufficient to consider w i = f(s i ); i =,...,, (6) with f non decreasing onotone function of S i. One particularly interesting case is obtained when weights are zero if S i is below a threshold γ : { Si if S w i = i γ (7) otherwise (i =,...,). 7

8 This is equivalent to excluding the corresponding variables fro the analysis. Also note that for γ = the weights are equal to the total variance, i.e. w i = S i. A choice less sensitive to the agnitude of S i is the following: w i = { if Si γ otherwise (i =,...,). (8) We shall see its usefulness in the siulation study of section FWE-controlling procedures with data-driven weights 3.2. Paraetric approach In the paraetric setting, Läuter, Gli and Kropf (996) noted that linear cobination of weight vectors ay depend on the data through certain quadratic fors, and the resulting (ordinary) t-tests retain their significance levels. Thus, by choosing the vector of weights suitably, the sae procedures ay be used to give ore weight to particular hypotheses selected by the data. Westfall, Kropf and Finos (24) propose using WHP with w i = S r i (r R,) ; i =,...,. This class of functions has the peculiarity of including two well-known procedures as particular cases: - if r =, it is equivalent to Bonferroni-Hol s procedure, and all weights are equal to one. - if r, this corresponds to Kropf and Läuter s procedure (22) which orders the tests based on their variance and sequentially tests the hypotheses of interest without corrections until the first accepted hypothesis is found. In the two extree cases, therefore, the influence of S i is different - in the first case S i is not considered, in the second it is very iportant since it establishes the order of adission of the hypotheses independently of their significance. By using the theory of Spherically Distributed Matrices (Fang and Zhang, 99), it is possible to state that under assuptions (4) and (5) the subset of variables corresponding to the true null hypothesis (defined by X ) is leftspherically distributed and the conditional distribution of X for fixed X X is also left-spherical. Hence, each of the coluns of X is also conditionally left-spherically distributed and the t tests (one-saple or two-saple) exactly aintain their type I error (i.e. their p-values are uniforly distributed). Kropf and Hoel (24), propose the use of the sae kind of weight but with WBHP. Again with r =, we have the usual unweighted Bonferroni-Hol ethod but the procedure does not converge to Kropf and Läuter s paraetric procedure (Kropf and Läuter, 22) when r. The FWE control again akes use of the theory of Spherical Distribution Matrices. 8

9 3.2.2 Nonparaetric approach For the one-saple proble the (nonparaetric) rank-based counterpart of WHP and WBHP uses the edians of the absolute values of the original observations instead of the total variances and the p-values fro the one-saple Wilcoxon tests instead of those fro the t tests. For the two-saple proble they ake use of the epirical interquartile range of the pooled saple of n = n + n 2 observations and the p-value fro the two-saple Wilcoxon test (WilcoxonMannWhitney U test). Kropf et al. (24) proved that WHP based on rank tests controls FWE. Kropf and Hoel (24) do the sae for the WBHP. The proofs for the failywise type I error control are very siilar to those for its paraetric counterpart. A second nonparaetric approach is offered by Finos and Salaso (25) and is based on perutation tests. The proof of the FWE control is based on the invariance principle: if w i is defined on the basis of a generic eleent of orbit X /X (i.e. the perutation saple space), the procedure controls the FWE because the distribution of the test statistic is independent of the values of w i. There is an interesting connection between the principle of invariance and the theory of left-spherical distribution. In both cases we obtain tests that, conditional to the weights based on the variance, have a unifor distribution under the null hypothesis. Finos e Salaso also show that, in the conditional fraework, any statistic that evaluates the dispersion of data depending on a generic eleent of the perutation space, is an unbiased ultiple test. Possible choices for defining the weights include using functions of the coefficient of variability (w i = f(cv ) = f(s i /x i )) or functions of the first-third quartile range (w i = f(iqr) = f(q 3i Q i ), where Q 3i and Q i are the third and first quartile for the i-th variable). This last choice sees to be particularly useful when there are heavy-tailed variables, giving little weight to the large values produced by the rando errors and ore weight to the fixed effects. Finally, these results can be extended fro the class of Bonferroni-like cobining function to the broader class of non paraetric cobining functions defined by Pesarin (2). 3.3 FDR controlling procedures with data-driven weights In order to prove that the weights chosen as in (6) guarantee FDR control, we shall ake use of the theory of spherically distributed atrices and follow the ideas provided in Benjaini and Hochberg (997), we point out the steps of the proof where it is necessary to refer to the theory of left-spherically distributed atrices to guarantee that the p-values corresponding to the true hypotheses H i follow a unifor distribution conditional to w i. Lea For any independent p-values corresponding to the true null hypotheses, any set of = p-values, corresponding to the false null hypotheses, any set of weights w i, i= w i = defined as in (6) and any constant q, the ultiple testing procedure defined by (3) satisfies the 9

10 inequality E(Q(w) P + = p,...,p = p ) i= w i q. (9) Theore 3. If test statistics are independent, then the procedure based on (3) and weights defined as in (6) controls the WFDR at level q. The proof of the theore is straightforward fro the previous lea. For dependent variables, WFDR control is proven by the following theore. È Theore 3.2 When procedure (3) with weights defined as in (6) is perfored with j= (/Σ h jw h ) replacing q, it always controls the FDR at sig- i M w i nifance level less than or equal to q (M being the set of indexes that correspond to true null hypotheses). Theore 3.3 If the joint distribution of the test statistics is PRDS in M, the procedure given by (3) controls the FDR at level weights are defined as in (6). 4 A siulation study È i M w i q also when It has been noted that the classic procedures lose power when there is a high nuber of variables. For this reason this siulation study focuses on cases with a high nuber of variables. We consider the following odel for ultivariate one-saple data: at first, the variances of each of the easureents are assued to be independently generated as σi 2 τ λ/χ 2 λ, i =,...,, where χ2 λ denotes a chi-square distributed rando variable with λ degrees of freedo. The paraeter τ is a nuisance paraeter reflecting overall scale; for convenience it is taken to be equal to in the siulations. The paraeter λ is specified in each siulation; sall λ corresponds to large variance heterogeneity across variables, while λ = corresponds to variance hoogeneity across variables; we take λ = 5, λ = and λ = in the siulations. We also consider the average epirical ratio between the variance of the variances and the variance that would results in the case of hoogeneity across all variables (i.e. λ = ), indicated by ζ. This can be considered an indication of heterogeneity across the variables. Next, conditional to σi 2, the effect sizes δ ji = µ i /σ i ; j =,...,n (see (4)) are assued to be drawn independently fro a ixture N(,σδ 2 ) and single point () distributions. The paraeter σδ 2 is specified in the siulations; larger σδ 2 denotes generally larger alternatives. We take σ2 δ = or 2 in our siulations. The ixing paraeter is denoted by π, with π = P(δ i ), and represents the proportion of variables under H. It varies fro to.5. Finally, conditional to the eans and variances, the observed data vector is assued to coe fro a -diensional ultivariate noral distribution, with i.i.d. coponents.

11 Power is taken to be the average fraction of correctly rejected hypotheses in each siulation. Specifically, defining K = {i H i is rejected} and K 2 = {i δ i }, we define Power = E( K K 2 / K 2 K2 > ). Figure and 2 show the power estiates with saple size (n) equal to 5 and respectively, for different proportion of active variables (π), variances of effects σδ 2, variance heterogeneity (λ) and nuber of variables () using weights as defined in (6), (7) and (8) (with γ = 2), using siulated data sets. Aong copeting procedures that control the FWE, we show only the WBHP that generally proves to be ore powerful than the WHP. A detailed report of the power estiates for all the presented procedures and various thresholds is available upon request to the corresponding author. [Figure about here.] [Figure 2 about here.] A discussion of the results is given in the Conclusion section. 5 Conclusions In this paper we have discussed data-driven weighted procedures controlling FWE. We also propose new data-driven weighted procedures controlling FDR in the case of independence between variables and under Positive Regression Dependency on the subset of variables corresponding to null hypotheses (PRDS). These procedures provide good results as long as the variances of the variables are approxiately unifor under H. As heterogeneity grows, the procedures (though reaining correct) becoe less powerful. However the siulation study suggests they can also be applied in cases of high incidence of nonhoogeneity. In particular, the solution proposed in (8) based on a threshold, sees to provide better results in the FWE-controlling WBHP procedure even when there is very high heterogeneity. This is particularly true for siulations with a high nuber of variables (M = ). This is due to the fact that the threshold value reduces the nuber of tested variables and this copensates the loss of power characterizing the Bonferroni-like procedures in high diensional situations. In contrast, the proposal ade in (6) sees to behave better in ters of power in the WFDR-controlling procedure even when the ζ ratio is large. As an exaple, consider the setting n = 5, = and σ δ = 2 (figure, plots on the botto) and consider the case with % of active variables. The power of the FDR-controlling procedure is about 33%. When there is coplete hoogeneity, the power of the WFDR-controlling procedure proposed in (6) (w = S 2 ) is very high (6%). Power does not change when the variance heterogeneity across variables is large. On the contrary, when the lack of hoogeneity is huge then power decreases very rapidly. As a general guideline, the gain in power is higher for saller saple size (see also the foral discussion of section 3.), whereas power sees to be independent fro the nuber of variables involved. Although the cobination of low saple size and high nuber of variables used in the

12 siulation could sees negligible in the real applications, this setting is the ost coon one in brain iaging studies, where the saple size rarely exceeds 8- subjects and the nuber of variables involved are rarely less than.. Also in icroarray studies the saple size and nuber of variables often have siilar settings. The software has been developed in Matlab 7 (Mathwork inc. c ) and is available upon request to the corresponding author. REFERENCES Benjaini, Y. and Hochberg, Y. (995), Controlling the false discovery rate: A new and powerful approach to ultiple testing. J. Roy. Statist. Soc. Ser. B 57, Benjaini, Y., Hochberg, Y. (997), Multiple Hypotheses Testing with Weights. Scandinavian Journal of Statistics, 24: Benjaini, Y., Yekutieli, D. (2), The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics Vol 29, n 4, Fang, K.T., Zhang, Y.T. (99), Generalized Multivariate Analysis, Science Press, Beijing. Finos, L., Pesarin, F., Salaso, L. (23), Cobined tests for controlling ultiplicity Closed Testing procedures. Italian Journal of Applied Statistics 5, Finos, L., Salaso, L. (26), Weighted ethods controlling the ultiplicity when the nuber of variables is uch higher than the nuber of observations. Journal of Nonparaetric Statistics 8, n 2, Golub, T.R., Sloni, D.K., Taayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H. (999), Molecular classification of cancer: class discovery and class prediction by gene expression onitoring Science 286, Hol S. (979), A siple sequentially rejective ultiple test procedure Scand. J. Statist., 6, Läuter, J. (996), Exact t and F tests for analysing studies with ultiple endpoints. Bioetrics 52, Läuter, J., Gli, E., Kropf, S. (996), New ultivariate tests for data with an inherent structure. Bioetrical Journal 38, Erratu: Bioetrical Journal 4, 5. Kropf, S. and Läuter, J. (22), Multiple test for different sets of variables using a data-driven ordering of hypotheses, with an application to gene expression data. Bioetrical Journal 44, Kropf, S., Hoel, G., (24), 7th Tartu Conference on Multivariate Statistics in 23 and appeared in Acta et Coentationes Universitatis Tartuensis de Matheatica 8 (24), spec. vol., Kropf, S. Läuter, J., Eszlinger M., Krohn K., Paschke R. (24), Nonparaetric Multiple Test Procedures with Data-driven Order of Hypotheses and with Weighted Hypotheses. Journal of Statistical Planning and Inference 25,

13 Marcus, R., Peritz, E. and Gabriel, K.R. (976), On closed testing procedures with special reference to ordered analysis of variance. Bioetrika 63, Pesarin, F. (2), Multivariate perutation test with application to biostatistics. Wiley, Chichester. Rosenthal, R. and Rubin, D.B. (983). Enseble-adjusted p-values. Psychological Bulletin 94, Sarkar, T. K., (969). Soe lower bounds of reliability. Tech Report, No. 24 Dept. of Operation Research and Statistics, Stanford University. Sarkar, S. K., (998). Soe probability inequalities for ordered MTP 2 rando variables: A proof of Sies conjecture The Annals of Statistics, 26(2), Spjtvoll, E. (972). On the optiality of soe ultiple coparison procedures. Ann. Math. Statist. 43, Westfall, P.H., Krishen, A. (2). Optially weighted, fixed sequence and gatekeeper ultiple testing procedures Journal of Statistical Planning and Inference 99, Westfall, P.H., Kropf, S., Finos, L. (24). Weighted FWE-controlling ethods in high-diensional situations. Recent Developents in Multiple Coparison Procedures, Institute of Matheatical Statistics Lecture Notes-Monograph Series, Vol. 47, Y. Benjaini, F. Bretz, and S. Sarkar, eds., ACKNOWLEDGEMENT Authors wish to thank two Referees for helpful coents and suggestions. APPENDIX Proof of theore 2.2. Let C (i) k denote the event at which the true null hypothesis i is rejected ( along È with other k (true and/or false) hypotheses. j È i= Now let p ijk = P {P i [ wi q, j i= wi q]} C(i) k ). Note that for any given set of ordered weights w i, k= ( i j p ijk = P {P i [ w i q, Note that, for each i the C (i) k E(Q(w)) = = i M k= i M k= h k w h i M j= k=j = w h i M j= i j w ) i q]} ( kc (i) k ) = qw j. () are disjoint, so the FDR can be expressed as: h k w h ( P (P i k p ijk = j= h j w h h j w h h k w h i M j= k=j p ijk i M j= q ( = 3 j= ) q) C(i) = h k w h h j w h ) Σ h j w h k p ijk = k= p ijk i M w h q ()

14 Q.E.D. Proof of the Lea. When = the result is straightforward. Now, we assue that (9) holds for any. If = : all null hypotheses are false, Q =, and E(Q(w) P = p,...,p + = p + ) = i= w i q. (2) + If > : Let P ( ) denote the p-value corresponding to the largest p-value aong P P. Since these correspond to the true null hypotheses, P ( is ) distributed as the largest of independent U(,) rando variables (this is due to the fact that the conditional distribution of X for fixed X X is leftspherical), and f (p) = P( ) p for p. Let us also define j as the largest j, + j +, satisfying Let p P ( = p, ) p j j i= w i q. (3) + denote the value of the right side of (3) at j. Conditional to p + E(Q(w) P + = p,...,p + = p ) = E(Q(w)) P ( = p,p ) + = p,...,p + = p )f P (p)dp + (4) p E(Q(w)) P ( ) = p,p + = p,...,p + = p )f P (p)dp (5) For p p all j hypotheses are rejected, and Fro (4) we get Q(w) = i= w i j i= w. (6) i i= w i j i= w (p ) i= = w j i i= w i j i i= w i + q(p ) + i= = w i + q(p ) +. (7) In order to evaluate (5) we further condition to the p-value at which P ( ) is achieved, indexed by i, i : 4

15 = i = p E(Q(w) P ( ) = p,p ( ) = P i,p + = p,......,p + = p )f P (p)dp = p E(Q(w) P ( ) = p,p ( ) = P i,p + = p,......,p + = p )f P (p)dp (8) Let us consider both cases: when P j P ( ) = P i = p < p j+ for j > j, or when p < P ( = P ) i = p < p j +. Fro the definition of j and p, no hypothesis can be rejected because of the value of p, p j+,...,p +. Therefore, when all hypotheses true and false are considered together and their p-values are ordered, a hypothesis H (i) can be rejected only if there exists a constant k, i k j, for which k P (k) i= w (i) q. (9) p ( + )p Equivalently, H (i) will be rejected if k satisfies with P (k) p (j ) k i= w (i) (j ) j i= w (i) ( j i= w ) i ( + )p q = w (i) k i= w ( j (i) i= w ) i (j ) ( + )p q. (2) (j ) = j i= w. (2) (i) These weights satisfy w, and j i= w i = j ; since we conditioned to P i = P () = p, the reaning P i /p are distributed as independent U(, ) rando variables (again, this is due to the left-spherical property of X for fixed X X ); p i /p for i = +,...,j are values in the interval (,) corresponding to false null hypotheses. Hence, using (3) to test the j = hypotheses, of which are true, is equivalent to using the procedure with the constant È j i= w (i) (+))p q taking the role of q. Applying now the starting assuption that 9 holds for any induction hypothesis we have that E(Q(w) P ( ) = p,p ( ) = P i,p + = p,...,p + = p ) = i=;i i w i (j ) = ( j ) i= w i ( + )p q where (23) is derived after replacing w with its definition in (2). (22) i= w i w i q (23) ( + )p 5

16 The upper bound in (23) depends on p, but not on a particular j p j < p < p j+. Let us recall that for j the range for p is p < p P j+ and it does depend on i. Therefore, by integrating (23) over (p,], while still conditional to i we get p i= w i w i q p dp = ( + )p Averaging now over i, fro (8) and (24) we get = i = i= w i w i q( (p ) ). (24) ( + ) E(Q(w)) P + = p,...,p + = p )f P (p)dp = p ( ) i= w i w i q( (p ) i= ) = w i + + q( (p ) ) (25) Finally by adding (25) and (7) we get the desired inequality (2) for +. Q.E.D. Proof of theore 3.2. In order to prove that the procedure for dependent variables controls the WFDR, we siply need to note that P i, i =,..., in () and () are under H, hence uniforly distributed conditional to w i. Indeed, X is left-spherically distributed, its conditional distribution for fixed X X is also left-spherical and each of the coluns of X is conditionally leftspherically distributed as well and the result follows. Q.E.D. Proof of theore 3.3. It directly follows fro the fact that only statistics under the null hypothesis have to be PRDS, therefore the sae consideration of proof of theore 3.2 over the left-spherically distributed variables holds. Q.E.D. 6

17 λ=inf λ=5 λ= n=5 = σ = eff ζ=.e+ λ=inf ζ=4.39e+ λ= ζ=.23e+25 λ= n=5 = σ = eff ζ=.e ζ=4.894e ζ=4.767e+22 n=5 = σ =2 eff ζ=.e ζ=9.36e ζ=5.533e+8 n=5 = σ =2 eff ζ=.e+ ζ=.e+2 ζ=.244e+23 FDR WFDR th=2 WFDR w=s 2 BHP WBHP th=2 WBHP w=s 2 Figure : Siulation results for n = 5: Power estiates for different values of π (abscissa axis), and σδ 2 (along the rows), λ (along the coluns). FDR = FDRcontrolling procedure, WFDR = Weighted FDR-controlling procedure, BHP = Bonferroni-Hol FWE-controlling procedure, WBHP = Benjaini-Hochberg weighted FWE-controlling procedure. Moreover th=2 eans threshold γ = 2 (see definition (8)), w = S 2 eans weight w equal to the total variance (see definition (7) with γ = ). 7

18 λ=inf λ=5 λ= n= = σ = eff ζ=.e+ λ=inf ζ=4.39e+ λ= ζ=.23e+25 λ= n= = σ = eff ζ=.e ζ=4.894e ζ=4.767e+22 n= = σ =2 eff ζ=.e ζ=9.36e ζ=5.533e+8 n= = σ =2 eff ζ=.e+ ζ=.e+2 ζ=.244e+23 FDR WFDR th=2 WFDR w=s 2 BHP WBHP th=2 WBHP w=s 2 Figure 2: Siulation results for n = : Power estiates for different values of π (abscissa axis), and σδ 2 (along the rows), λ (along the coluns). FDR = FDRcontrolling procedure, WFDR = Weighted FDR-controlling procedure, BHP = Bonferroni-Hol FWE-controlling procedure, WBHP = Benjaini-Hochberg weighted FWE-controlling procedure. Moreover th=2 eans threshold γ = 2 (see definition (8)), w = S 2 eans weight w equal to the total variance (see definition (7) with γ = ). 8

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Generalized Augmentation for Control of the k-familywise Error Rate

Generalized Augmentation for Control of the k-familywise Error Rate International Journal of Statistics in Medical Research, 2012, 1, 113-119 113 Generalized Augentation for Control of the k-failywise Error Rate Alessio Farcoeni* Departent of Public Health and Infectious

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

arxiv: v1 [stat.ot] 7 Jul 2010

arxiv: v1 [stat.ot] 7 Jul 2010 Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 8675. arxiv:007.094v

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Some Proofs: This section provides proofs of some theoretical results in section 3.

Some Proofs: This section provides proofs of some theoretical results in section 3. Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain

More information

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests StatsM254 Statistical Methods in Coputational Biology Lecture 3-04/08/204 Multiple Testing Issues & K-Means Clustering Lecturer: Jingyi Jessica Li Scribe: Arturo Rairez Multiple Testing Issues When trying

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Meta-Analytic Interval Estimation for Bivariate Correlations

Meta-Analytic Interval Estimation for Bivariate Correlations Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007 Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5 Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students

More information

Moments of the product and ratio of two correlated chi-square variables

Moments of the product and ratio of two correlated chi-square variables Stat Papers 009 50:581 59 DOI 10.1007/s0036-007-0105-0 REGULAR ARTICLE Moents of the product and ratio of two correlated chi-square variables Anwar H. Joarder Received: June 006 / Revised: 8 October 007

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Estimation of the Population Mean Based on Extremes Ranked Set Sampling

Estimation of the Population Mean Based on Extremes Ranked Set Sampling Aerican Journal of Matheatics Statistics 05, 5(: 3-3 DOI: 0.593/j.ajs.05050.05 Estiation of the Population Mean Based on Extrees Ranked Set Sapling B. S. Biradar,*, Santosha C. D. Departent of Studies

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression Advances in Pure Matheatics, 206, 6, 33-34 Published Online April 206 in SciRes. http://www.scirp.org/journal/ap http://dx.doi.org/0.4236/ap.206.65024 Inference in the Presence of Likelihood Monotonicity

More information

Simultaneous critical values for t-tests in very high dimensions

Simultaneous critical values for t-tests in very high dimensions Bernoulli 17(1, 2011, 347 394 DOI: 10.3150/10-BEJ272 Siultaneous critical values for t-tests in very high diensions HONGYUAN CAO 1 and MICHAEL R. KOSOROK 2 1 Departent of Health Studies, 5841 South Maryland

More information

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Sampling How Big a Sample?

Sampling How Big a Sample? C. G. G. Aitken, 1 Ph.D. Sapling How Big a Saple? REFERENCE: Aitken CGG. Sapling how big a saple? J Forensic Sci 1999;44(4):750 760. ABSTRACT: It is thought that, in a consignent of discrete units, a certain

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting Institute for Policy Research Northwestern University Working Paper Series WP-07-4 orrecting a Significance est for lustering in Designs With wo Levels of Nesting Larry V. Hedges Faculty Fellow, Institute

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Research in Area of Longevity of Sylphon Scraies

Research in Area of Longevity of Sylphon Scraies IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Multivariate Methods. Matlab Example. Principal Components Analysis -- PCA

Multivariate Methods. Matlab Example. Principal Components Analysis -- PCA Multivariate Methos Xiaoun Qi Principal Coponents Analysis -- PCA he PCA etho generates a new set of variables, calle principal coponents Each principal coponent is a linear cobination of the original

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Selecting an optimal rejection region for multiple testing

Selecting an optimal rejection region for multiple testing Selecting an optial rejection region for ultiple testing A decision-theoretic alternative to FDR control, with an application to icroarrays David R. Bickel Office of Biostatistics and Bioinforatics Medical

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Selecting an Optimal Rejection Region for Multiple Testing

Selecting an Optimal Rejection Region for Multiple Testing Selecting an Optial Rejection Region for Multiple Testing A decision theory alternative to FDR control, with an application to icroarrays David R. Bickel Office of Biostatistics and Bioinforatics Medical

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests Working Papers 2017-03 Testing the lag length of vector autoregressive odels: A power coparison between portanteau and Lagrange ultiplier tests Raja Ben Hajria National Engineering School, University of

More information

Constructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maximal Invariant

Constructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maximal Invariant Aerican Journal of Matheatics and Statistics 03, 3(): 45-5 DOI: 0.593/j.ajs.03030.07 Constructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maxial Invariant

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Physics 139B Solutions to Homework Set 3 Fall 2009

Physics 139B Solutions to Homework Set 3 Fall 2009 Physics 139B Solutions to Hoework Set 3 Fall 009 1. Consider a particle of ass attached to a rigid assless rod of fixed length R whose other end is fixed at the origin. The rod is free to rotate about

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information