arxiv: v1 [stat.ot] 7 Jul 2010
|
|
- Oswin Jennings
- 5 years ago
- Views:
Transcription
1 Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, arxiv: v [stat.ot] 7 Jul 200 Abstract: This paper is otivated by the analysis of gene expression sets, especially by finding differentially expressed gene sets between two phenotypes. Gene log 2 expression levels are highly correlated and, very likely, have approxiately noral distribution. Therefore, it sees reasonable to use two-saple Hotelling s test for such data. We discover soe unexpected properties of the test aking it different fro the ajority of tests previously used for such data. It appears that the Hotelling s test does not always reach axial when all arginal distributions are differentially expressed. For highly correlated data its axial is attained when about a half of arginal distributions are essentially different. For the case when the correlation coefficient is greater than 0.5 this test is ore ful if only one arginal distribution is shifted, coparing to the case when all arginal distributions are equally shifted. Moreover, when the correlation coefficient increases the of Hotelling s test increases as well. Introduction In any situations ians need to test ultidiensional hypotheses. In a lot of cases coponents of observed rando vectors are highly dependent, which ay change the properties of the tests used. One of the exaples of such data is provided by gene expression levels. Gene expressions are highly correlated between genes (see for exaple Klebanov and Yakovlev (2007)). Moreover, often the genes are investigated not just separately, but also as a set of dependent genes. Therefore one has to deal with ultidiensional hypotheses and in order to test such hypotheses, gene sets should be expressed differentially. The ost popular tests for gene sets are Hotelling s test, N-test and tests derived fro arginal t-s. In the papers Ackerann and Strier (2009), Glazko and Eert-Streib (2009), an approach to coparing these test in various situations was ade. Our goal is not to ake another coparison, but rather to describe soe interesting properties of the Hotelling s test which sees to be unexpected. 2 Hotelling s test One of the ost well known tests is t-test. Hotelling s test is an ultidiensional extension of t-test. Siilar to t-test, we can consider both one-saple and two-saple Hotelling s test. One-saple case deals with the hypothesis that the expected value of a saple fro ultidiensional noral distribution is equal to soe given vector. In the two-saple case it deals with the hypothesis of the equality of expected values of two saples fro ultidiensional noral distributions (with the equal covariance structure). In this paper we will focus on the two-saple Hotelling s test. Suppose we have two independent saples (of sizes n x and n y, respectively) fro two n-diensional noral distributions with identical covariance atrices equal toσ. In other words, we consider X,..., X nx as i.i.d rando vectors having N n (µ x,σ) and Y,..., Y ny as i.i.d rando vectors having N n (µ y,σ) (X i and Y j are independent for all i=,..., n x ; j=,..., n y ). For siplicity we assue that n<n x + n y. Our goal is to test the hypothesis H :µ x =µ y against alternative A :µ x µ y. For this we use Hotelling s test based on the T 2 = n xn y n x + n y ( X Ȳ) T S ( X Ȳ), ()
2 where X = nx n x i= X i; Ȳ = ny n y i= Y i and S = distribution by nx i= (X i X)(X i X) T + ny i= (Y i Ȳ)(Y i Ȳ) T n x +n y 2. T 2 is related to the F- n x + n y n n(n x + n y 2) T 2 F(n, n x + n y n ). (2) For ore details about Hotelling s test see, for exaple, Chatfield and Collins (980). We ade the assuption n<n x + n y for two reasons. For n n x + n y the estiate S ofσresults in an irregular atrix, so that S does not exist and oreover nuerator of (2) is non-positive as well as the degree of freedo of the F-distribution. In such situations it is possible to use soe pseudo-inversion of S and in order to estiate p-value of H, we can use perutations of (X,..., X nx, Y,..., Y ny ). 3 Hotelling s test for strongly dependent data As it was entioned above, genes are highly dependent and we will suppose that their log 2 expression levels have approxiately noral distributions. Many papers work with gene sets (for exaple Barry et al. (2008)) instead of genes alone and therefore deal with ultidiensional hypotheses. It sees to be reasonable to use Hotelling s test in this situation. Assue that we have two ultidiensional saples and need to test the hypothesis suggesting the equality of expected values in these two saples. Assue for siplicity that all eleents on the ain diagonal of the covariance atrixσfor both saples are equal to and all other eleents are equal toρ>0, i.e. ρ ρ... ρ ρ Σ= ρ... ρ ρ ρ Further on, we assue thatµ x = (0,..., 0) T, butµ y has first eleents equal to and the others equal to 0, i.e. µ y = ( ) T.,...,, 0,..., 0 } {{ }} {{ } n For large n x and n y the atrixσand its estiate S are approxiately the sae as well as the differences between the expected values (µ x µ y ) and between the ean values ( X Ȳ). When dialing with real data, n x and n y ight not be large enough, but for theoretical reasons we ay use the approxiations S Σ and X Ȳ µ x µ y. In this case S Σ, that is S Σ =. α β β... β β α β... β β β α whereα= (+(n 2)ρ) ( ρ)(+(n )ρ) andβ= ρ ( ρ)(+(n )ρ). For fixed n x and n y we can consider the fraction n xn y n x +n y = k of Hotelling s () as a noralizing constant. Let us denote T 2 Hotelling s withσ instead of S andµ x µ y instead of X Ȳ divided by the constant k. Therefore, we have T 2 /k T 2 = (µ x µ y ) T Σ (µ x µ y ), 2
3 = (,...,, 0,..., 0 } {{ }} {{ } n ) α β β... β β α β... β β β α = α ( 2 )β= (+(n 2)ρ) ( )ρ ( ρ)(+(n )ρ) = (+(n )ρ) ( ρ)(+(n )ρ). (3) Let us note that it does not atter ifµ y consists of ones and zeros or equals to a constant a and zeros. In the latter case, T 2 would be ultiplied by a 2. Now we will work with T 2 and investigate its behavior. If we changed to + (eaning that we add one ore different arginal distribution) we would expect that the T 2 increases and that so does the of Hotelling s test. We need to check if it is indeed the case. For better understanding let the nuber of ones inµ y be the index of T 2 (we will write it only when it is needed). Now we change to +=h and we have T 2 + = T 2 +α 2β. If we expected that T 2 is an increasing function of thenα 2 β should be greater then zero. But we have α 2β= +(n 2)ρ ( ρ)(+(n )ρ) 2ρ ( ρ)(+(n )ρ) = +(n 2 2)ρ ( ρ)(+(n )ρ). Since the denoinator is greater than zero, thenα 2β>0 only if 2h n >ρ. It eans that for not very sall values ofρ s and > n 2 the T 2 is a decreasing function of. This eans that axial of Hotelling s test (as a function of ) is not always attained for = n but for ρ s which are not very sall we have axial for near n 2. Soe exaples of the behavior of T 2 as a function of are illustrated on figure. However, this issue is not the only one that is surprising about Hotelling s test. Now we look if T 2 is always lower than Tn 2. It is the case when one different arginal distribution influences ore than all n different distributions. So we need to copare α with nα n(n )β. We have T 2 T 2 n 2+2 n = ( 2ρ) =α nα+n(n )β=(n ) ( ρ)(+(n )ρ). So T 2 T 2 n < 0 only if ρ < 0.5. Therefore we can say that for ρ > 0.5 Hotelling s test has better for alternative with only one arginal shift than for alternative that all arginal distributions are equally shifted. It can be seen fro figure as well. Moreover, the T 2 is an increasing function ofρ, that ay see surprising as well. 4 Hotelling s test for two-diensional data Let us look at Hotelling s test in the two-diensional case. As in the previous case, we will consider the two-saple proble, but now we will generalize the difference of expected values of these two saples. Suppose thatµ x µ y = (a, a 2 ) and that the covariance atrix is Σ= ( ρ ρ ). 3
4 n= 0 ; rho= 0. n= 0 ; rho= 0.3 n= 0 ; rho= 0.5 n= 0 ; rho= 0.7 n= 0 ; rho= n= 5 ; rho= 0. n= 5 ; rho= 0.3 n= 5 ; rho= 0.5 n= 5 ; rho= 0.7 n= 5 ; rho= n= 25 ; rho= 0. n= 25 ; rho= 0.3 n= 25 ; rho= 0.5 n= 25 ; rho= 0.7 n= 25 ; rho= n= 40 ; rho= 0. n= 40 ; rho= 0.3 n= 40 ; rho= 0.5 n= 40 ; rho= 0.7 n= 40 ; rho= Figure : Plots of T 2 for n=0, 5, 25, 40;ρ=0., 0.3, 0.5, 0.7, 0.9; and =,..., n. Notice: each plot is differently scaled! Then inverse ofσis the atrix with diagonal eleentsα = ( ρ)(+ρ) and off-diagonal eleents β = ρ ( ρ)(+ρ). Then T 2 =αa 2 +αa2 2 2βa a 2. First we consider that a = and a 2 = 0. Then T 2 =α. Now we will investigate for which a, a 2 R T 2 =α. That is, we need to solve an equation After dividing both sides of equation (4) byαwe get For fixed a equation (5) is quadratic in a 2 with the roots 2ρa ± (2ρa ) 2 4(a 2 ) a 2,2 =. 2 αa 2 +αa2 2 2βa a 2 =α. (4) a 2 + a2 2 2ρa a 2 =0. (5) 4
5 It is defined only if (2ρa ) 2 4(a 2 ) 0, i.e. for a. Soe plots of the solutions of the equation ρ 2 (5) for different values of the correlation coefficientρare given on figure 2. We can see that the plots of these solutions produce elliptic curves. Let us rotate these ellipses by the angle ϕ = Π/4 clockwise. To do this, we use transforation 2 2 a = x cosϕ y sinϕ= 2 x 2 y, 2 2 a 2 = x sinϕ+y cosϕ= 2 x+ 2 y, where x and y are new rotated coordinates. After substitution into (5) it gives ( 2 x 2 y)2 + ( 2 x+ 2 y)2 2ρ( 2 x 2 y)( 2 x+ 2 y) = x 2 ( ρ)+y 2 (+ρ)= x2 a 2+y2 b2=, where a= ρ and b= +ρ are respectively the ajor radius and the inor radius of the ellipse. Since a>b, the Hotelling s test has the weakest in the direction of a = a 2, while the fastest increase of its is observed towards the direction of a = a 2. For exaple, forρ=0.9 we have a=3.62 and b= It eans that for a = a 2 = = Hotelling s test has approxiately the sae = 0.53 as well). So, if there is only one arginal distri- as for a =, a 2 = 0 (or for a = a 2 = bution shifted by one unit, then the of Hotelling s test is approxiately the sae as if both arginal distribution were equally shifted (in the sae direction) by units (for the shift in opposite direction it should be only 0.53 unit). These results are in contradiction with other ultidiensional tests. For exaple, consider the test based on arginal t-s. The of this test is higher if both distributions are shifted by the sae aount (both t-s are large, not depending on direction of shift) than if there was only one arginal distribution shifted (one t- is near zero). 5 Theory and reality The analytical results obtained above should be verified by checking if actual Hotelling s test outcoes correspond to the analytical results regarding real data. In this section we will copare the behavior of theoretical Hotelling s T 2 with real Hotelling s T 2. For large n x and n y we assued that T 2 T 2 /k, where k= n xn y n x +n y. Constant k changes as n x and n y change. It is reasonable to divide Hotelling s T 2 by k instead of ultiplying T 2 by k in order to be able to copare how do T 2 and T 2 differ for various n x and n y. In order to copare the actual results with the analytical ones, we did the following siulations. All data were siulated fro n-diensional noral distributions. Consider three different values for the nuber of genes in a gene set. We take n=0, n=5 and n=25. All siulations were perfored for three different values of the correlation coefficientρ : ρ=0.,ρ=0.5 andρ=0.9. In order to copare the behavior of Hotelling s test for various sizes of saples we took three choices of n x and n y : n x = n y = n, n x = n y =.4n and n x = n y = 2.4n. The value which is the nuber of false arginal distributions varies fro one to n. The shift value for each of the different arginal distributions is set to one. The theoretical Hotelling s is calculated according to (3). Real Hotelling s is estiated fro 000 siulations for each 5
6 rho=0.25 rho=0.5 rho=0.9 a a a a a a Figure 2: Plots of solutions of equation (??) for two-diensional case for rho=0.25;0.5;0.9. Notice: each plot is differently scaled! case (as the ean of T 2 /k obtained fro the siulations). Plots of our siulated cases are shown on figure 3. We can see that for all siulated situations, the shapes of real and theoretical Hotelling s s are siilar. The only difference is in the heights of these curves. For sall n x and n y T 2 has higher values than for large n x and n y. The reason for that stes fro the inaccurate estiates of the expected values and of the covariance atrix. However, we observe that with the increase of n x and n y, T 2 /k goes to T 2 relatively fast. Therefore, the behavior of Hotelling s test for real data is expected to be very siilar to the behavior of T 2. In previous section we saw that for the two-diensional case the plotted shifts with equal values of the of theoretical Hotelling s test for elliptic curves. Hotelling s s T 2 are rando variables. Therefore, we can only estiate if their expected values for elliptic curves when plotted. To check this we did following siulations. Instead of calculating the shifts for which Hotelling s test has equal s, we took the points provided by the elliptic curves observed for theoretical Hotelling s s. For each pair of these points (a, a 2 ) we did 000 siulations and calculated Hotelling s. We estiated the expected value ET 2 /k as the ean for these 000 repetitions. We divided Hotelling s s by k for better understanding how fast these s go to T 2. We did this siulation for the values of the correlation coefficientρ=0.3 andρ=0.9 and as the nuber of observations in each saple we took n x = n y = 5, n x = n y = 0 and n x = n y = 20. Results of our siulation are given in Table. We observe that estiated ean values of T 2 /k are not very different, that they go to T 2 and that their variance decreases with increasing nuber of observations. Clearly, these points for elliptic curves. Hence, we can clai that the real Hotelling s test behaves very siilar to the theoretical one and the theory derived for the theoretical test holds for the real Hotelling s test as well. 6 Discussion In this paper we have discovered that two-saple Hotelling s test (for testing the equality of the expected values of two saples fro ultidiensional noral distribution with equal covariance structure) has soe unexpected properties. At first sight, one could expect that with a larger nuber of false arginal distribu- 6
7 n= 0 ; rho= 0. n= 0 ; rho= 0.5 n= 0 ; rho= n= 5 ; rho= 0. n= 5 ; rho= 0.5 n= 5 ; rho= n= 25 ; rho= 0. n= 25 ; rho= 0.5 n= 25 ; rho= Figure 3: Coparisons of theoretical s T 2 and real Hotelling s T 2 /k for nuber of genes n = 0 5, 25 (fro the top to the botto); for correlation coefficient ρ = 0., 0.5, 0.9 (fro the left to the right) and nuber of observations in each saple n x = n y = n (denoted by + ), n x = n y =.4n (denoted by x ) and n x = n y = 2.4n (denoted by ). The theoretical T 2 is denoted by. Nuber of different arginal distribution is set fro one to n. Notice: each plot is differently scaled! tions the of this test increases. But we have discovered that this is not true in general. For highly correlated and high diensional data (such as data sets of gene expressions) axial of Hotelling s test is reached when only about one half of the arginal distributions are shifted. We have found out that when the correlation inside the saple is greater than 0.5, then the Hotelling s test can have a better if only one arginal distribution is different, as opposed to the case when all arginal hypotheses are false. Moreover, the of Hotelling s test increases for higher correlations. That observation ay see soewhat unexpected as well. We have investigated Hotelling s test in detail in two-diensional case. We have found that properties of this test are uch different fro ones of the tests based on arginal t-. All reasonable tests based on arginal t- do not depend on the direction of the shift. But the of Hotelling s test increases very slowly if both of the arginal distributions are equally shifted and increases uch faster if arginal distributions are shifted in opposite directions. Moreover, alternatives with equal values of the for ellipsoids. 7
8 Table : Results of siulations of two-diensional adjusted Hotelling s s T 2 /k with n s = n x = n y observations for each saple and correlation coefficientρ. T 2 stands for theoretical Hotelling s s and (a, a 2 ) is difference between expected valuesµ x µ y of these saples. On botto line is the estiate of variance of each colun. T 2 =.0989 ρ=0.3 T 2 = ρ=0.9 a a 2 n s = 5 n s = 0 n s = 20 a a 2 n s = 5 n s = 0 n s = var: var: Acknowledgents The author thanks Prof. Lev Klebanov, DrSc. for valuable coents, rearks and overall help. The work was supported by the grant SVV 2635/200. References Ackerann, M and Strier, K.(2009), A general odular fraework for gene set enrichent analysis, BMC Bioinforatics, 0, 47. Barry, W.,T., Nobel, A., B., and Wright, F., A. (2008), A al fraework for testing functional categories in icroarray data, The Annals of Applied Statistics, 2 No., Chatfield, C. and Collins, A.,J. (980), Introduction To Multivariate Analysis, Chapan&Hall/CRC. Glazko, G. and Eert-Streib, F. (2009), Unite and conquer: univariate and ultivariate approaches for finding differentially expressed gene sets, Bioinforatics, 25 No. 8, Klebanov, L. and Yakovlev, A. (2007), Diverse correlation structures in gene expression data and their utility in iproving al inference, The Annals of Applied Statistics, No.2,
are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,
Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations
More informationThe proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).
A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationKeywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution
Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationThe Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters
journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn
More informationTesting equality of variances for multiple univariate normal populations
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationPrincipal Components Analysis
Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations
More informationMultiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests
StatsM254 Statistical Methods in Coputational Biology Lecture 3-04/08/204 Multiple Testing Issues & K-Means Clustering Lecturer: Jingyi Jessica Li Scribe: Arturo Rairez Multiple Testing Issues When trying
More informationExtension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels
Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique
More informationMoments of the product and ratio of two correlated chi-square variables
Stat Papers 009 50:581 59 DOI 10.1007/s0036-007-0105-0 REGULAR ARTICLE Moents of the product and ratio of two correlated chi-square variables Anwar H. Joarder Received: June 006 / Revised: 8 October 007
More informationBootstrapping Dependent Data
Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationAN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS
Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationFDR- and FWE-controlling methods using data-driven weights
FDR- and FWE-controlling ethods using data-driven weights LIVIO FINOS Center for Modelling Coputing and Statistics, University of Ferrara via N.Machiavelli 35, 44 FERRARA - Italy livio.finos@unife.it LUIGI
More informationAN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME
J. Japan Statist. Soc. Vol. 35 No. 005 73 86 AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME H. S. Jhajj*, M. K. Shara* and Lovleen Kuar Grover** For estiating the
More informationESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics
ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationGeneralized Augmentation for Control of the k-familywise Error Rate
International Journal of Statistics in Medical Research, 2012, 1, 113-119 113 Generalized Augentation for Control of the k-failywise Error Rate Alessio Farcoeni* Departent of Public Health and Infectious
More informationOBJECTIVES INTRODUCTION
M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and
More informationAn Introduction to Meta-Analysis
An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl
More informationSimultaneous critical values for t-tests in very high dimensions
Bernoulli 17(1, 2011, 347 394 DOI: 10.3150/10-BEJ272 Siultaneous critical values for t-tests in very high diensions HONGYUAN CAO 1 and MICHAEL R. KOSOROK 2 1 Departent of Health Studies, 5841 South Maryland
More informationLinear Transformations
Linear Transforations Hopfield Network Questions Initial Condition Recurrent Layer p S x W S x S b n(t + ) a(t + ) S x S x D a(t) S x S S x S a(0) p a(t + ) satlins (Wa(t) + b) The network output is repeatedly
More informationCh 12: Variations on Backpropagation
Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith
More informationSupport recovery in compressed sensing: An estimation theoretic approach
Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de
More informationA remark on a success rate model for DPA and CPA
A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance
More informationMAXIMUM LIKELIHOOD BASED TECHNIQUES FOR BLIND SOURCE SEPARATION AND APPROXIMATE JOINT DIAGONALIZATION
BEN-GURION UNIVERSITY OF TE NEGEV FACULTY OF ENGINEERING SCIENCE DEPARTENT OF ELECTRICAL AND COPUTER ENGINEERING AXIU LIKELIOOD BASED TECNIQUES FOR BLIND SOURCE SEPARATION AND APPROXIATE JOINT DIAGONALIZATION
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationProc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES
Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationOptimal Jamming Over Additive Noise: Vector Source-Channel Case
Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper
More informationMulti-Scale/Multi-Resolution: Wavelet Transform
Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationSome Proofs: This section provides proofs of some theoretical results in section 3.
Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationTesting the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests
Working Papers 2017-03 Testing the lag length of vector autoregressive odels: A power coparison between portanteau and Lagrange ultiplier tests Raja Ben Hajria National Engineering School, University of
More informationNecessity of low effective dimension
Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that
More informationC na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction
TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationA general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:
A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,
More informationRecovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)
Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More information. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe
PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal
More informationKernel-Based Nonparametric Anomaly Detection
Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of
More informationStatistical Logic Cell Delay Analysis Using a Current-based Model
Statistical Logic Cell Delay Analysis Using a Current-based Model Hanif Fatei Shahin Nazarian Massoud Pedra Dept. of EE-Systes, University of Southern California, Los Angeles, CA 90089 {fatei, shahin,
More informationScale Invariant Conditional Dependence Measures
Sashank J. Reddi sjakkar@cs.cu.edu Machine Learning Departent, School of Coputer Science, Carnegie Mellon University Barnabás Póczos bapoczos@cs.cu.edu Machine Learning Departent, School of Coputer Science,
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More informationGene Selection for Colon Cancer Classification using Bayesian Model Averaging of Linear and Quadratic Discriminants
Gene Selection for Colon Cancer Classification using Bayesian Model Averaging of Linear and Quadratic Discriinants Oyebayo Ridwan Olaniran*, Mohd Asrul Affendi Abdullah Departent of Matheatics and Statistics,
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informationSupplementary Information for Design of Bending Multi-Layer Electroactive Polymer Actuators
Suppleentary Inforation for Design of Bending Multi-Layer Electroactive Polyer Actuators Bavani Balakrisnan, Alek Nacev, and Elisabeth Sela University of Maryland, College Park, Maryland 074 1 Analytical
More informationCompressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements
1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer
More informationDERIVING TESTS OF THE REGRESSION MODEL USING THE DENSITY FUNCTION OF A MAXIMAL INVARIANT
DERIVING TESTS OF THE REGRESSION MODEL USING THE DENSITY FUNCTION OF A MAXIMAL INVARIANT Jahar L. Bhowik and Maxwell L. King Departent of Econoetrics and Business Statistics Monash University Clayton,
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More informationMultivariate Methods. Matlab Example. Principal Components Analysis -- PCA
Multivariate Methos Xiaoun Qi Principal Coponents Analysis -- PCA he PCA etho generates a new set of variables, calle principal coponents Each principal coponent is a linear cobination of the original
More informationSub-Gaussian estimators of the mean of a random vector
Sub-Gaussian estiators of the ean of a rando vector arxiv:702.00482v [ath.st] Feb 207 GáborLugosi ShaharMendelson February 3, 207 Abstract WestudytheprobleofestiatingtheeanofarandovectorX givena saple
More informationMeta-Analytic Interval Estimation for Bivariate Correlations
Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationBootstrapping clustered data
J. R. Statist. Soc. B (2007) 69, Part 3, pp. 369 390 Bootstrapping clustered data C. A. Field Dalhousie University, Halifax, Canada A. H. Welsh Australian National University, Canberra, Australia [Received
More informationSaddle Points in Random Matrices: Analysis of Knuth Search Algorithms
Saddle Points in Rando Matrices: Analysis of Knuth Search Algoriths Micha Hofri Philippe Jacquet Dept. of Coputer Science INRIA, Doaine de Voluceau - Rice University Rocquencourt - B.P. 05 Houston TX 77005
More informationUniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval
Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationReed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.
Coding Theory Massoud Malek Reed-Muller Codes An iportant class of linear block codes rich in algebraic and geoetric structure is the class of Reed-Muller codes, which includes the Extended Haing code.
More informationInspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information
Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub
More informationWeighted- 1 minimization with multiple weighting sets
Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationLean Walsh Transform
Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the
More informationPORTMANTEAU TESTS FOR ARMA MODELS WITH INFINITE VARIANCE
doi:10.1111/j.1467-9892.2007.00572.x PORTMANTEAU TESTS FOR ARMA MODELS WITH INFINITE VARIANCE By J.-W. Lin and A. I. McLeod The University of Western Ontario First Version received Septeber 2006 Abstract.
More informationOcean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers
Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.
More informationGEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS
ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationConstructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maximal Invariant
Aerican Journal of Matheatics and Statistics 03, 3(): 45-5 DOI: 0.593/j.ajs.03030.07 Constructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maxial Invariant
More informationTail Estimation of the Spectral Density under Fixed-Domain Asymptotics
Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,
More informationEstimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples
Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked
More informationW-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS
W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationSupplemental Material for Correlation between Length and Tilt of Lipid Tails
Suppleental Material for Correlation between Length and Tilt of Lipid Tails Ditry I. Kopelevich and John F. Nagle I. RESULTS FOR ALTERNATIVE DIRECTOR DEFINITIONS A. Alternative Director Definitions The
More informationA proposal for a First-Citation-Speed-Index Link Peer-reviewed author version
A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version Made available by Hasselt University Library in Docuent Server@UHasselt Reference (Published version): EGGHE, Leo; Bornann,
More informationCorrecting a Significance Test for Clustering in Designs With Two Levels of Nesting
Institute for Policy Research Northwestern University Working Paper Series WP-07-4 orrecting a Significance est for lustering in Designs With wo Levels of Nesting Larry V. Hedges Faculty Fellow, Institute
More informationarxiv: v1 [math.nt] 14 Sep 2014
ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row
More informationSeismic Analysis of Structures by TK Dutta, Civil Department, IIT Delhi, New Delhi.
Seisic Analysis of Structures by K Dutta, Civil Departent, II Delhi, New Delhi. Module 5: Response Spectru Method of Analysis Exercise Probles : 5.8. or the stick odel of a building shear frae shown in
More informationHighly Robust Error Correction by Convex Programming
Highly Robust Error Correction by Convex Prograing Eanuel J. Candès and Paige A. Randall Applied and Coputational Matheatics, Caltech, Pasadena, CA 9115 Noveber 6; Revised Noveber 7 Abstract This paper
More informationComparing Probabilistic Forecasting Systems with the Brier Score
1076 W E A T H E R A N D F O R E C A S T I N G VOLUME 22 Coparing Probabilistic Forecasting Systes with the Brier Score CHRISTOPHER A. T. FERRO School of Engineering, Coputing and Matheatics, University
More informationBest Procedures For Sample-Free Item Analysis
Best Procedures For Saple-Free Ite Analysis Benjain D. Wright University of Chicago Graha A. Douglas University of Western Australia Wright s (1969) widely used "unconditional" procedure for Rasch saple-free
More informationUncertainty of Measured Energy Savings from Statistical Baseline Models
4375 VOL. 6, NO. 1 HVAC&R RESEARCH JANUARY 2000 Uncertainty of Measured Energy Savings fro Statistical Baseline Models T. Agai Reddy, Ph.D., P.E. David E. Claridge, Ph.D., P.E. Meber ASHRAE Meber ASHRAE
More informationRecursive Algebraic Frisch Scheme: a Particle-Based Approach
Recursive Algebraic Frisch Schee: a Particle-Based Approach Stefano Massaroli Renato Myagusuku Federico Califano Claudio Melchiorri Atsushi Yaashita Hajie Asaa Departent of Precision Engineering, The University
More informationFast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering of Stem Patterns
Fast Structural Siilarity Search of Noncoding RNs Based on Matched Filtering of Ste Patterns Byung-Jun Yoon Dept. of Electrical Engineering alifornia Institute of Technology Pasadena, 91125, S Eail: bjyoon@caltech.edu
More informationSUSAN M. PITTS ABSTRACT
A FUNCTIONAL APPROACH TO APPROXIMATIONS FOR THE INDIVIDUAL RISK MODEL BY SUSAN M. PITTS ABSTRACT A functional approach is taken for the total clai aount distribution for the individual risk odel. Various
More information