are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,
|
|
- Nathaniel Wilkins
- 5 years ago
- Views:
Transcription
1 Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations and Definitions Let denote the nuber of genes in the data that has gene expressions for each gene on p categories tuor sizes and noral tissue. Let µ ij denote the ean response corresponding to the i-th category in j-th gene, i =,..., p, j =,...,. The proble of biological interest we discuss in the context of uterine fibroid data is to detect genes that are differentially expressed in a tuor size category copared to noral saple. Thus, if the last category, p, corresponds to noral saple, we need to test the pairwise differences θ ij = µ ij µ pj, i =, 2,..., q and j =, 2,...,, are equal to zero, where, q = p. For each gene j, the pairwise null and alternative hypotheses are, H j 0i : θ ij = 0 against H j i : θ ij 0, i =,..., q. S For each gene j, we have a vector of paraeters θ j = θ j, θ 2j,..., θ qj. We first need to find out the genes that are differentially expressed in at least one tuor saple copared to noral saples. Thus, we define the null and alternative screening hypotheses to test the significance of each gene as, H j 0screen : θ j = 0 against H j screen : θ j 0, j =,...,, S2 for testing whether all paraeters θ ij, i =,..., q are siultaneously 0 or not, equivalently, whether all µ ij, i =,..., p are equal or not. These hypotheses give rise to failies of hypotheses corresponding to genes with each faily having a screening hypothesis and q pairwise hypotheses. Figure S shows a siple graphical representation of the structure of hypotheses in our forulation. Let x k ij denote the kth observed gene expressions of the j th gene in i th group, k =,..., n i with n i being the saple size for i th group, j =,...,, i =,..., p. Let T ij and P ij, i =,..., q, j =,...,, denote the test statistics and the p-values respectively for testing H j 0i. The test statistics for testing the screening hypotheses H j 0screen are obtained as a function of T ij, i =,..., q, for instance, the highest order statistic of T ij, i =,..., q. We denote the p-values for testing screening hypotheses as P j screen. For each faily j we denote a vector of p-values, P j = P j, P 2j,..., P qj based on the test statistics T j = T j, T 2j,..., T qj, for testing the pairwise hypotheses in S. If H j 0i is rejected we conclude on direction, i.e., declare θ ij > 0 if T ij > 0 or declare θ ij < 0 if T ij < 0. Given the screening p-values P j screen, for every j =, 2,...,, to carry out the siultaneous testing of the screening hypotheses in S2, we use the BH-procedure [], that controls the FDR at a given level α. This is a step-up procedure as follows: given the ordered screening p-values P screen P screen 2 P screen and the corresponding screening null hypotheses H 0screen, H 0screen 2,..., H 0screen,
2 Page 2 of 8 find, R = ax { j : P screen j jα/} and reject H 0screen,..., H 0screen R, provided the axiu exists, otherwise, accept all the screening hypotheses. When an H j 0screen : θ j = 0 is rejected, further decisions are ade on the pairwise hypotheses in S and on rejection, directional decisions are ade on the signs of the coponent θ ij. A Type I error ight occur due to wrongly rejecting H j 0screen or correctly rejecting H j 0screen but wrongly rejecting Hj 0i for soe i =,..., q. A Type II error ight occur due to failing to reject a false null hypothesis H j 0screen or correctly rejecting a false H j 0screen but failing to reject a false null pairwise hypothesis H j 0i for soe i =,..., q. A directional error Type III error ight occur due to correctly rejecting H j 0screen but wrong assignent of the sign of θ ij while correctly rejecting H j 0i : θ ij = 0. The general practice in any ultiple testing proble is to find a procedure that controls the Type I errors and iniizes the Type II errors. Here, we need to control Type I as well as Type III errors. A practical way of doing that would be to use an error rate cobining both Type I and Type III errors in the FDR fraework and ake sure that it is controlled. An error rate that cobines Type I errors and Type III errors in FWER setup is dfwer [2, 3], which is the probability of aking at least one Type I error or Directional error. Heller et al. [4] used the Overall False Discovery Rate OFDR, which is the expected proportion of falsely discovered gene sets out of all discovered gene sets, as an appropriate error easure to control, in their two-stage procedure for identifying differentially expressed genes and gene sets. The concept of OFDR was introduced by Benjaini and Heller [5] in the context of testing partial conjunction hypotheses. Inspired by Heller et al. [4] and Shaffer [2], Guo et al. [6] define the ixed directional False Discovery Rate dfdr defined below. Let V j denote the indicator function of at least one Type I error or Directional Error coitted while testing faily j and the pairwise hypotheses in it, i.e., V j is if either H j 0screen is falsely rejected or Hj 0screen is correctly rejected but at least one Type I error or Directional error occurs while testing pairwise hypotheses in the faily j; V j is 0 otherwise. Let, R denote the nuber of screening hypotheses rejected by a ultiple testing procedure, that is, R = ax { j : P screen j jα/}. Then, dfdr is forally defined as follows. Definition : dfdr - ixed directional False Discovery Rate. The expected proportion of Type I and Directional errors aong all discovered failies, [ j= df DR = E V j ]. S3 axr, S2 Proof of Theore Proof Let, I 0 denote the set of true null screening hypotheses H j 0screen and I denote set of false H j 0screen with I 0 = 0 and I =, 0 + =. Fro definition of dfdr, [ j= df DR = EQ = E V j ]. S4 axr, Let, P screen P screen, denote the ordered screening p-values. In the event that R = r, P screen k rα/ for k =, 2,..., r and P screen k > r +α/
3 Page 3 of 8 for k = r +,...,. Consequently, we have r nuber of P j screen s that are rα/. Then, S4 can be written as, EQ = P rv j =, R = r r = r= j= r= j I 0 + r P r r= j I Pscreen j rα, R j = r r P r Pscreen j rα, Type I or Type III error at j, R j = r. S5 where, R j denotes the nuber of screening hypotheses rejected fro the set of screening hypotheses {H0screen, H0screen, 2..., H j 0screen, Hj+ 0screen,..., H 0screen} by using the step-up procedure with the critical values i + α/, i =,...,. First consider the second ter in S5. By the assuption of independence of p-value vectors we can write it as follows: r= j I r= = α. r P r Pscreen j rα, Type I or Type III error at j P r R j = r j I r rα P r R j = r S6 S7 The inequality in S6 follows as we use an dfwer controlling procedure at level rα/ for each significant faily and as j I, the probability of aking at least one Type I error or directional error in faily j is rα/. Suing over all values of r, the equality in S7 follows by noting that r= R P r j = r =. Next consider the first ter in S5. By the assuption of independence of p-value vectors we can write it as follows: r= j I 0 r= = 0α. P r P r screen j rα P r R j = r j I 0 r rα P r R j = r S8 S9 The inequality in S8 follows due to the fact that the true null p-values are stochastically larger than or equal to U0,. Suing over all values of r, the equality
4 Page 4 of 8 in S9 follows by noting that r= P r R j = r =. The result follows by cobining S7 and S9. In the theore we only assue that the p-value vectors are independent and we do not discuss about the coponent pairwise p-values. This iplies that the p-values within a gene across tuor saples ay have any dependence structure. The dfwer controlling procedure used will tell us under what kind of dependence structures of p-values within genes is the procedure valid. For exaple, if we use Hol s dfwer controlling procedure, which is proved to control the d- FWER when the p-values are independent, then this theore is valid under the additional assuption that the pairwise p-values P ij, i =, 2,..., q of a vector P j are independent. The generality of this algorith akes it a flexible procedure to apply to several practical situations where ultidiensional directional decisions are required to ake. Although, in the paper, we discuss testing of differential gene expression in each tuor size against the noral saple for each gene, this procedure can be applied to any type of pairwise coparison desired to be tested for each gene. For exaple, if it is of interest to group genes by the inequalities aong the ean responses, we would want to detect the pattern of ean responses in the p categories, known as directional pattern, and see how the ean responses vary across the categories. Soe coon inequalities are µ j µ 2j µ pj onotone pattern, µ j µ ij µ i+j µ pj ubrella pattern with peak µ ij. To test for the pattern we need to test the differences of ean response of the categories, θ ij = µ i+j µ ij, i =, 2,..., q and j =, 2,..., and q = p. If the proble of interest is testing all pairwise differences of the p categories, possibly unordered, then q = pp /2. Based on the question we want to answer fro a data, appropriate ethodology can be developed fro this general procedure. S3 Details of Statistical Methodology for FGS Gene Expression Data Dunnett P screen for Step : The scenario is of coparing ultiple groups with a coon control group and the standard ethod used in this situation is the Dunnett test [7]. Dunnett test is a powerful ethod that is designed specifically for this kind of coparison. The test assues that the underlying distributions of the data fro the different groups have sae variance and the test statistics are obtained by using a pooled estiate of the variance. This assuption is valid for the Uterine fibroid data as the gene expressions are noralized to have siilar eans and variances for coparability. The test statistic for testing S is given by, Tij Dunn = x ij x pj, s j n i + n p S0 where, x ij = /n i n i k= xk ij, i =,..., p; j =,..., are the saple eans and s 2 j = p ni i= k= x k 2/ν, ij x ij j =,...,, are the pooled saple variances, with ν = p i= n i p. The null distribution of each Tij Dunn is univariate t-distribution with ν degrees of freedo. The vector of Dunnett test statistics = Tj Dunn,..., Tqj Dunn has a q-variate t-distribution with ν degrees of T Dunn j
5 Page 5 of 8 freedo and correlation atrix R = ρ ik q q, where ρ ik are defined in Dunnett. The Dunnett-adjusted critical value for the two sided test for { T Dunn ij, i =,..., q }, denoted by u α q, ν, is the quantile of the above q-variate t-distribution such that, P r Tj Dunn or equivalently, P r ax {i=,...,q} u α,..., T Dunn qj α u α = 2, T Dunn ij uα = α 2. The observed values of Tij Dunn, t Dunn ij, say, are copared to u α q, ν and we reject H j 0i if t Dunn ij > uα q, ν. For each gene j we have a vector of observed Dunnett test statistics, t Dunn ij = t Dunn j,..., t Dunn qj. Let, t ax j = ax i=,...,q t Dunn ij. We obtain the screening P -value for testing the screening hypotheses S2 as follows: P r ax T ij > t ax j {i=,...,q} = P r t ax j T ij t ax j, i =,..., q, S where, the probability is obtained fro the CDF of q-variate t-distribution with ν degrees of freedo and the correlation structure defined in Dunnett [7]. Let R denote the nuber of rejected screening hypotheses while applying the BH procedure to these screening p-values. Dunnett dfwer controlling procedure for Steps 2-3: We use Dunnett procedure to obtain the Dunnett-adjusted p-values, Pij Dunnett, for testing the pairwise hypotheses as follows, Dunnett P ij = 2 P r ax {i=,...,q} T Dunn ij t Dunn ij. S2 We reject H j Dunnett 0i if the corresponding adjusted P -value P ij Rα and conclude θ ij > 0 if Tij Dunn > 0 and vice versa. S4 Suppleentary Results for the Siulation Study Methods Used in Siulation Study In this section we describe the different dfdr controlling ethodologies used in the siulation study. We develop ethodologies by cobining Dunnett screening procedure with four different dfwer controlling procedures for steps 2 and 3 and copare the perforance of the resulting four ethodologies with Guo et al. [6] ethodology in ters of dfdr control and power. Screening Procedure for Step : Dunnett P screen : The Dunnett ethod [7, 8, 9] is a powerful ethod specifically designed to test hypotheses where several treatents are copared with a coon control in an unbalanced one-way layout. The ultiple pairwise coparisons we
6 Page 6 of 8 ake with the FGS gene expression data fit into the fraework of Dunnett test [7]. The test statistics, T ij, for testing the pairwise hypotheses, are obtained as described in [7], with details given in Section S3. Procedures for Steps 2 and 3: Dunnett Procedure: We obtain the Dunnett-adjusted pairwise p-values [8, 9], to be used in Step 2 of the algorith and call the P ij. The details are given in section S3. The procedure rejects H j 0i if Pij Rα q and conclude on sign of θ ij based on the sign of the test statistic T ij. Hol Procedure: We use Hol s step-down procedure at level Rα within each significant gene. Order the pairwise p-values for each significant gene j as P j P j q with corresponding hypotheses denoted as H j 0,..., Hj 0 q. Let k k = Rα,..., q be the axiu index such that P j i q i+ reject H j 0,..., Hj 0 for all i k, then k, conclude on direction based on the sign of the test statistic and accept the rest of the hypotheses. Hochberg Procedure: We use Hochberg s step-up procedure at level Rα within each significant gene to identify significant categories. Let k k =,..., q be the Rα axiu index such that P j k q k+, then reject Hj,..., H j k, conclude on direction based on the sign of the test statistic and accept the rest of the hypotheses. Bonferroni Procedure: Bonferroni procedure is a coonly used single step ultiple testing procedure that strongly controls FWER. Reject H j 0i if P ij Rα q and conclude on direction based on the sign of the test statistic T ij. Guo et al. [6] Procedure: The procedure of Guo et al. [6] is a special case of the general testing procedure. They first obtain the p-values, {P j,..., P qj }, for testing the pairwise hypotheses in S and use the Bonferroni pooling to obtain the screening p-values as follows: P j screen = q in {P j,..., P qj }. Note down R, the nuber of significant genes. For each significant gene, use the Bonferroni procedure discussed above to identify significant pairwise differences and conclude on direction. Results for the Siulation Study In this section we present the results fro the siulation study that consider different kinds of dependencies of the gene expressions. The dependence within genes across groups eans that the gene expressions fro different tuor saples are dependent but given any two genes for a saple, the expressions are independent between the two genes; as several tuor saples belong to sae subject, this kind of dependence structure is valid. The dependence aong genes eans that the gene expressions fro different genes are dependent but given any two saples for a gene, the expressions are independent; as the genes belonging to sae gene set have siilar activity, this kind of dependence structure is also valid in the FGS icroarray data.
7 Page 7 of 8 We define the concept of average power for the three step procedure for coparing different ethodologies that control the dfdr at the sae level. Let, R denote the set of indices of rejected screening hypotheses H j 0screen with R = R. Let, R j, j R denote the nuber of pairwise hypotheses H j 0i rejected for each significant gene. Let, S denote the nuber of false null screening hypotheses rejected and let, S j, j R denote the nuber of false null pairwise hypotheses rejected for each discovered gene. Then, the average power for the three step procedure is defined as follows: Definition 2: Average Power in Three Step Procedure: The average power is defined as the expected proportion of false null hypotheses rejected in Step and Step 2 aong all rejections in Step and Step 2, S + Average Power = E j R S j ax R +. j R R j, S3 The different dfdr controlling procedures that are shown in the Figures 4-6 and Figures S2-S5 are: i The proposed ethodology Dunnett, ii Dunnett screening and Hol procedure Hol, iii Dunnett screening and Hochberg procedure Hochberg, iv Dunnett screening and Bonferroni procedure Bonferroni and v Guo et al. [6] procedure. Dependence within genes across groups. In this case the coponents Z s ij, i =,..., p are dependent with Zs ij Nµ ij, and have a coon correlation ρ = 0.2, 0.5, 0.8. The results are suarized in Figures 5 and Figures S2-S3. All five procedures control the dfdr at less than α = Once again, as in the case of independence, the proposed ethod gains in power copared to the other ethods. Dependence aong genes. We next considered the situation where gene expressions are dependent aong genes. For this siulation, the coponents Zij s, j =,..., are dependent with Zij s Nµ ij, and have a coon correlation ρ = 0.2, 0.5, 0.8. The results are suarized in Figure 6 and Figures S4-S5. Again, all five procedures control the dfdr at less than α = 0.05 and as in the case of independence, the proposed ethod gains in power copared to the other ethods. Author details References. Benjaini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to ultiple testing. Journal of the Royal Statistical Society: Series B 57, Shaffer, J.P.: Control of directional errors with stagewise ultiple test procedures. Annals of Statistics 8, Finner, H.: Stepwise ultiple test procedures and control of directional errors. Annals of Statistics 27, Heller, R., Manduchi, E., Grant, G., Ewens, W.: A flexible two-stage procedure for identifying gene sets that are differentially expressed. Bioinforatics 25, Benjaini, Y., Heller, R.: Screening for partial conjunction hypotheses. Bioetrics 64, Guo, W., Sarkar, S.K., Peddada, S.D.: Controlling false discoveries in ultidiensional directional decisions, with applications to gene expression data on ordered categories. Bioetrics 66, Dunnett, C.W.: A ultiple coparison procedure for coparing several treatents with a control. Journal of the Aerican Statistical Association 50,
8 8. Dunnett, C.W., Tahane, A.C.: Step-down ultiple tests for coparing treatents with a control in unbalanced one-way layouts. Statistics in Medicine 0, Dunnett, C.W., Tahane, A.C.: A step-up ultiple test procedure. Journal of the Aerican Statistical Association 87, Page 8 of 8
Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests
StatsM254 Statistical Methods in Coputational Biology Lecture 3-04/08/204 Multiple Testing Issues & K-Means Clustering Lecturer: Jingyi Jessica Li Scribe: Arturo Rairez Multiple Testing Issues When trying
More informationGeneralized Augmentation for Control of the k-familywise Error Rate
International Journal of Statistics in Medical Research, 2012, 1, 113-119 113 Generalized Augentation for Control of the k-failywise Error Rate Alessio Farcoeni* Departent of Public Health and Infectious
More informationKeywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution
Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality
More informationarxiv: v1 [stat.ot] 7 Jul 2010
Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 8675. arxiv:007.094v
More informationTEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES
TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,
More informationSome Proofs: This section provides proofs of some theoretical results in section 3.
Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationFDR- and FWE-controlling methods using data-driven weights
FDR- and FWE-controlling ethods using data-driven weights LIVIO FINOS Center for Modelling Coputing and Statistics, University of Ferrara via N.Machiavelli 35, 44 FERRARA - Italy livio.finos@unife.it LUIGI
More informationSimultaneous critical values for t-tests in very high dimensions
Bernoulli 17(1, 2011, 347 394 DOI: 10.3150/10-BEJ272 Siultaneous critical values for t-tests in very high diensions HONGYUAN CAO 1 and MICHAEL R. KOSOROK 2 1 Departent of Health Studies, 5841 South Maryland
More informationTopic 5a Introduction to Curve Fitting & Linear Regression
/7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationAN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS
Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationSelecting an optimal rejection region for multiple testing
Selecting an optial rejection region for ultiple testing A decision-theoretic alternative to FDR control, with an application to icroarrays David R. Bickel Office of Biostatistics and Bioinforatics Medical
More informationBiostatistics Department Technical Report
Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent
More informationSelecting an Optimal Rejection Region for Multiple Testing
Selecting an Optial Rejection Region for Multiple Testing A decision theory alternative to FDR control, with an application to icroarrays David R. Bickel Office of Biostatistics and Bioinforatics Medical
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationTesting equality of variances for multiple univariate normal populations
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationProc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES
Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationData-driven hypothesis weighting increases detection power in genome-scale multiple testing
CORRECTION NOTICE Nat. Methods; doi 10.1038/neth.3885 (published online 30 May 2016). Data-driven hypothesis weighting increases detection power in genoe-scale ultiple testing Nikolaos Ignatiadis, Bernd
More informationFairness via priority scheduling
Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation
More informationExtension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels
Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique
More informationBootstrapping Dependent Data
Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationControl of Directional Errors in Fixed Sequence Multiple Testing
Control of Directional Errors in Fixed Sequence Multiple Testing Anjana Grandhi Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102-1982 Wenge Guo Department of Mathematical
More informationInference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression
Advances in Pure Matheatics, 206, 6, 33-34 Published Online April 206 in SciRes. http://www.scirp.org/journal/ap http://dx.doi.org/0.4236/ap.206.65024 Inference in the Presence of Likelihood Monotonicity
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationOBJECTIVES INTRODUCTION
M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationThe proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).
A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with
More informationMultiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre
Multiscale Entropy Analysis: A New Method to Detect Deterinis in a Tie Series A. Sarkar and P. Barat Variable Energy Cyclotron Centre /AF Bidhan Nagar, Kolkata 700064, India PACS nubers: 05.45.Tp, 89.75.-k,
More informationExperimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis
City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationSupport recovery in compressed sensing: An estimation theoretic approach
Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationW-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS
W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well
More informationAnalyzing Simulation Results
Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationMSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE
Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL
More informationEffective joint probabilistic data association using maximum a posteriori estimates of target states
Effective joint probabilistic data association using axiu a posteriori estiates of target states 1 Viji Paul Panakkal, 2 Rajbabu Velurugan 1 Central Research Laboratory, Bharat Electronics Ltd., Bangalore,
More informationA method to determine relative stroke detection efficiencies from multiplicity distributions
A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationScale Invariant Conditional Dependence Measures
Sashank J. Reddi sjakkar@cs.cu.edu Machine Learning Departent, School of Coputer Science, Carnegie Mellon University Barnabás Póczos bapoczos@cs.cu.edu Machine Learning Departent, School of Coputer Science,
More informationAn Introduction to Meta-Analysis
An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl
More informationWhen Short Runs Beat Long Runs
When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationLinguistic majorities with difference in support
Linguistic ajorities with difference in support Patrizia Pérez-Asurendi a, Francisco Chiclana b,c, a PRESAD Research Group, SEED Research Group, IMUVA, Universidad de Valladolid, Valladolid, Spain b Centre
More informationConstrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008
LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We
More informationMeta-Analytic Interval Estimation for Bivariate Correlations
Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations
More informationTight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography
Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it
More informationESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics
ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationWeighted- 1 minimization with multiple weighting sets
Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationarxiv: v1 [stat.me] 1 Aug 2016
Null Models and Modularity Based Counity Detection in Multi-Layer Networks Subhadeep Paul and Yuguo Chen University of Illinois at Urbana-Chapaign Abstract arxiv:168.623v1 [stat.me] 1 Aug 16 Multi-layer
More informationEstimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples
Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked
More informationUniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval
Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,
More informationA Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)
1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu
More informationA general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:
A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,
More informationThe Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters
journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn
More informationRAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5
Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationON LEAST FAVORABLE CONFIGURATIONS FOR STEP-UP-DOWN TESTS
Statistica Sinica 24 (2014), 1-23 doi:http://dx.doi.org/10.5705/ss.2011.205 ON LEAST FAVORABLE CONFIGURATIONS FOR STEP-UP-DOWN TESTS Gilles Blanchard 1, Thorsten Dickhaus 2, Étienne Roquain3 and Fanny
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon, Mohaad Ghavazadeh, Alessandro Lazaric To cite this version: Victor Gabillon, Mohaad Ghavazadeh, Alessandro
More informationInspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information
Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationBayesian Approach for Fatigue Life Prediction from Field Inspection
Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan
More informationSTOPPING SIMULATED PATHS EARLY
Proceedings of the 2 Winter Siulation Conference B.A.Peters,J.S.Sith,D.J.Medeiros,andM.W.Rohrer,eds. STOPPING SIMULATED PATHS EARLY Paul Glasseran Graduate School of Business Colubia University New Yor,
More informationASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical
IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul
More informationCorrecting a Significance Test for Clustering in Designs With Two Levels of Nesting
Institute for Policy Research Northwestern University Working Paper Series WP-07-4 orrecting a Significance est for lustering in Designs With wo Levels of Nesting Larry V. Hedges Faculty Fellow, Institute
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationSEISMIC FRAGILITY ANALYSIS
9 th ASCE Specialty Conference on Probabilistic Mechanics and Structural Reliability PMC24 SEISMIC FRAGILITY ANALYSIS C. Kafali, Student M. ASCE Cornell University, Ithaca, NY 483 ck22@cornell.edu M. Grigoriu,
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationMechanics Physics 151
Mechanics Physics 5 Lecture Oscillations (Chapter 6) What We Did Last Tie Analyzed the otion of a heavy top Reduced into -diensional proble of θ Qualitative behavior Precession + nutation Initial condition
More informationMoments of the product and ratio of two correlated chi-square variables
Stat Papers 009 50:581 59 DOI 10.1007/s0036-007-0105-0 REGULAR ARTICLE Moents of the product and ratio of two correlated chi-square variables Anwar H. Joarder Received: June 006 / Revised: 8 October 007
More informationAn improved self-adaptive harmony search algorithm for joint replenishment problems
An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,
More informationIAENG International Journal of Computer Science, 42:2, IJCS_42_2_06. Approximation Capabilities of Interpretable Fuzzy Inference Systems
IAENG International Journal of Coputer Science, 4:, IJCS_4 6 Approxiation Capabilities of Interpretable Fuzzy Inference Systes Hirofui Miyajia, Noritaka Shigei, and Hiroi Miyajia 3 Abstract Many studies
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More information