The spectrum of kernel random matrices

Size: px
Start display at page:

Download "The spectrum of kernel random matrices"

Transcription

1 The sectrum of kernel random matrices Noureddine El Karoui Deartment of Statistics, University of California, Berkeley Abstract We lace ourselves in the setting of high-dimensional statistical inference, where the number of variables in a dataset of interest is of the same order of magnitude as the number of observations n We consider the sectrum of certain kernel random matrices, in articular n n matrices whose (i, j-th entry is f(x i X j/ or f( X i X j 2 /, where is the dimension of the data, and X i are indeendent data vectors Here f is assumed to be a locally smooth function The study is motivated by questions arising in statistics and comuter science, where these matrices are used to erform, among other things, non-linear versions of rincial comonent analysis Surrisingly, we show that in high-dimensions, and for the models we analyze, the roblem becomes essentially linear - which is at odds with heuristics sometimes used to justify the usage of these methods The analysis also highlights certain eculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model high-dimensional data encountered in ractice 1 Introduction Recent years has seen newfound theoretical interest in the roerties of large dimensional samle covariance matrices With the increase in the size and dimensionality of datasets to be analyzed, questions have been raised about the ractical relevance of information derived from classical asymtotic results concerning sectral roerties of samle covariance matrices To address these concerns, one line of analysis has been the consideration of asymtotics where both the samle size, n and the number of variables in the dataset go to infinity, jointly, while assuming for instance that /n had a limit This tye of questions concerning the sectral roerties of large dimensional matrices have been and are being addressed in variety of fields, from hysics to various areas of mathematics While the toic is classical, with the seminal contribution Wigner (1955 dating back from the 1950 s, there has been renewed and vigorous interest in the study of large dimensional random matrices in the last decade or so This has led to new insights and the aearance of new canonical distributions (Tracy and Widom (1994, new tools (see Voiculescu (2000 and, in Statistics, a sense that one needs to exert caution with familiar techniques of multivariate analysis when the dimension of the data gets large and the samle size is of the same order of magnitude as that dimension So far in Statistics, this line of work has been concerned mostly with the roerties of samle covariance matrices In a seminal aer, Marčenko and Pastur (1967 showed a result that, from a statistical standoint, may be interreted as saying, roughly, that asymtotically, the histogram the eigenvalues of a samle (ie random covariance matrix is (asymtotically a deterministic non-linear deformation of the histogram of the eigenvalues of the oulation covariance matrix Remarkably, they managed to characterize this deformation for fairly general oulation covariances Their result was shown in great generality, I would like to thank Bin Yu for stimulating my interest in the questions considered in this aer and for interesting discussions on the toic I would like to thank Elizabeth Purdom for discussions about kernel analysis and Peter Bickel for many stimulating discussions about random matrices and their relevance in statistics I would also like to thank an anonymous referee for useful and constructive comments that resulted in an imroved resentation of the aer Suort from NSF grant DMS is gratefully acknowledged AMS 2000 SC: Primary: 62H10 Secondary: 60F99 Key words and Phrases : covariance matrices, kernel matrices, eigenvalues of covariance matrices, multivariate statistical analysis, high-dimensional inference, random matrix theory, machine learning, Hadamard matrix functions, concentration of measure Contact : nkaroui@statberkeleyedu 1

2 and introduced new tools to the field, including one that has become ubiquitous, the Stieltjes transform of a distribution In its best known form, their result says that when the oulation covariance is identity, and hence all the oulation eigenvalues are equal to 1, in the limit the samle eigenvalues are slit and, if n, they are sread between [(1 /n 2, (1 + /n 2 ], according to a fully exlicit density, known now as the density of the Marčenko-Pastur law Their result was later re-discovered indeendently in Wachter (1978 (under slightly weaker conditions, and generalized to the case of non-diagonal covariance matrices in Silverstein (1995, under some articular distributional assumtions, which we discuss later in the aer On the other hand, recent develoments have been concerned with fine roerties of the largest eigenvalue of random matrices, which became amenable to analysis after mathematical breakthroughs which haened in the 1990 s (see Tracy and Widom (1994, Tracy and Widom (1996 and Tracy and Widom (1998 Classical statistical work on joint distribution of eigenvalues of samle covariance matrices (see Anderson (2003 for a good reference then became usable for analysis in high-dimensions In articular, in the case of gaussian distributions, with Id covariance, it was shown in Johnstone (2001 and El Karoui (2003 that the largest eigenvalue of the samle covariance matrix is Tracy-Widom distributed More recent rogress (El Karoui (2007c managed to carry out the analysis for essentially general oulation covariance On the other hand, models for which the oulation covariance has a few searated eigenvalues have also been of interest: see for instance Paul (2007 and Baik and Silverstein (2006 Beside the articulars of the different tye of fluctuations that can be encountered (Tracy-Widom, Gaussian or other, researchers have been able to recisely localize these largest eigenvalues One interesting asect of those results is the fact that in the high-dimensional setting of interest to us, the largest eigenvalues are always ositively biased, with the bias being sometime large (We also note that in the case of iid data - which naturally is less interesting in statistics - results on the localization of the largest eigenvalue have been available for quite some time now, after the works Geman (1980 and Yin et al (1988 to cite a few This is naturally in shar contrast to classical results of multivariate analysis, which show n-consistency of all samle eigenvalues - though the ossibility of bias is a simle consequence of Jensen s inequality On the other hand, there has been much less theoretical work on kernel random matrices By this term, we mean matrices with (i, j entry of the form M i,j = k(x i, X j, where M is an n n matrix, X i is a -dimensional data vector, and k is a function of two variables, often called a kernel, that may deend on n Common choices of kernels include, for instance, k(x i, X j = f( X i X j 2 /t, where f is a function and t is a scalar, or k(x i, X j = f(x i X j/t For the function f, common choices include f(x = ex( x, f(x = ex( x a, for a certain scalar a, f(x = (1 + x a, or f(x = tanh(a + bx, where b is a scalar We refer the reader to Rasmussen and Williams (2006, Chater 4, or Williams and Seeger (2000 for more examles In articular, we are not aware of any work in the setting of high-dimensional data analysis, where grows with n However, given the ractical success and flexibility of these methods (we refer to Schölkof and Smola (2002 for an introduction, it is natural to try to investigate theoretically their roerties Further, as illustrated in the data analytic art of Williams and Seeger (2000, an n/ boundedness assumtion is not unrealistic as far as alications of kernel methods are concerned One aim of the resent aer is to shed some theoretical light on the roerties of these kernel random matrices, and to do so in relatively wide generality We note that the choice of renormalization that we make below is motivated in art by the arguments of Williams and Seeger (2000 and their ractical choices of kernels for data of varying dimensions Existing theory on kernel random matrices (see for instance the interesting Koltchinskii and Giné (2000, for fixed dimensional inut data, redicts that the eigenvalues of kernel random matrices behave - at least for the largest ones - like the eigenvalues of the corresonding oerator on L 2 (dp, if the data is iid with robability distribution P To be more recise, if X i is a sequence of iid random variables with distribution P, under regularity conditions on the kernel k(x, y, it was shown in Koltchinskii and Giné (2000 that, for any index l, the l-th largest eigenvalue of the kernel matrix M, with entries M i,j = 1 n k(x i, X j, 2

3 converges to the l-th largest eigenvalue of the oerator K defined as Kf(x = k(x, yf(ydp (y These insights have also been derived through more heuristic but nonetheless enlightening arguments in, for instance, Williams and Seeger (2000 Further, more recise fluctuation results are also given in Koltchinskii and Giné (2000 We also note interesting work on Lalacian eigenmas (see eg Belkin and Niyogi (2008 where, among other things, results have been obtained showing convergence of eigenvalues and eigenvectors of certain Lalacian random matrices (which are quite closely connected to kernel random matrices comuted from data samled from a manifold, to corresonding quantities for the Lalace- Beltrami oerator on the manifold These results are in turn used in the literature to exlain the behavior of non-linear versions of standard rocedures of multivariate statistics, such as Princial Comonent Analysis (PCA, Canonical Correlation Analysis (CCA or Indeendent Comonent Analysis (CCA We refer the reader to Schölkof et al (2004 for an introduction to kernel-pca, and to Bach and Jordan (2003 for an introduction to kernel-cca and kernel-ica At the heart of these techniques are the sectral roerties of kernel random matrices Because these techniques are used in bioinformatics, a field where large datasets are common and becoming the norm, it is natural to ask what can be said about these sectral roerties for high-dimensional data We show that for the models we analyze (ICA-tye models and generalizations that go beyond the linear setting of ICA, kernel random matrices essentially behave like samle covariance matrices and hence their eigenvalues suffer from the same bias roblems that affect samle covariance matrices in high-dimensions In articular, if one were to try to aly the heuristics of Williams and Seeger (2000, which were develoed for low-dimensional roblems, to the high-dimensional case, the redictions would be quite wildly wrong (A simle examle is rovided by the Gaussian kernel with iid Gaussian data, where the comutations can be done comletely exlicitly, as exlained in Williams and Seeger (2000 We also note that the scaling we use is different from the one used in low dimensions, where the matrices are scaled by 1/n This is because the high-dimensional roblem would be comletely degenerate if we used this normalization in our setting However, our results still give information about the roblem when it is scaled by 1/n From a random matrix oint of view, our study is connected to the study of Euclidean random matrices and distance matrices, which is of some interest in, for instance, Physics We refer to Bogomolny et al (2003 and Bordenave (2006 for work in this direction in the low (or fixed dimensional setting We also note that at the level of generality we lace ourselves in, the random matrices we study do not seem to be amenable to study through the classical tools of random matrix theory Hence, beside their obvious statistical interest, they are also interesting on urely mathematical grounds We now turn to the gist of our aer, which will show that high-dimensional kernel random matrices behave sectrally essentially like matrices closely connected to samle covariance matrices We will get two tyes of results: in Theorems 1 and 2, we get a strong aroximation result (in oerator norm for standard models (ICA-like studied in random matrix theory In Theorems 3 and 4, we characterize the limiting sectral distribution of our kernel random matrices, for a wider class of data distributions In Section 2, we also state clearly the consequences of our theorems and review the relevant theory of high-dimensional samle covariance matrices From a technical standoint, we adot a oint of view centered on the concentration of measure henomenon, as exosed for instance in Ledoux (2001, as it rovides a unified way to treat the two tyes of results we are interested in Finally, we discuss in our (self-contained conclusion (Section 3 the consequences of our results, and in articular some ossible limitations of standard random matrix models as tools to model data encountered in ractice, focusing on geometric roerties of datasets drawn according to those models As exlained in more details there, vectors drawn according to these standard random matrix models essentially live close to sheres and are almost orthogonal to one another, a roerty that may or may not be resent in datasets to be analyzed and can be seen as a key to many classical and less classical random matrix results (see also El Karoui (2007a 3

4 2 Sectrum of kernel random matrices Kernel random matrices do not seem to be amenable to analysis through the usual tools of random matrix theory In articular, for general f, it seems difficult to carry out either a method of moments roof, or a Stieltjes transform roof, or a roof that relies on knowing the density of the eigenvalues of the matrix Hence, we take an indirect aroach Our strategy is to find aroximations of the kernel random matrix that have two roerties First, the aroximation matrix is analyzable or has already been analyzed in random matrix theory Second, the quality of the aroximation is good enough that sectral roerties of the aroximating matrix can be shown to carry over to the kernel matrix The strategy in the first two theorems is to derive an oerator norm consistent aroximation of our kernel matrix In other words, if we call M our kernel matrix, we will find K such that M K 2 0, as n and tend to Note that both M and K are real symmetric (and hence Hermitian here We exlain after the statement of Theorem 1 why oerator norm consistency is a desirable roerty But let us say that in a nutshell, it imlies consistency for each individual eigenvalue as well as eigensaces corresonding to searated eigenvalues For the second set of theorems (Theorems 3 and 4, we will relax the distributional assumtions made on the data, but at the exense of the recision of the results we will obtain: we will characterize the limiting sectral distribution of our kernel random matrices Our theorems below show that kernel random matrices can be well aroximated by matrices that are closely connected to large-dimensional covariance matrices The sectral roerties of those matrices have been the subject of a significant amount of work in recent and less recent years, and hence this knowledge, or at least art of it, can be transferred to kernel random matrices In articular, we refer the reader to Marčenko and Pastur (1967, Wachter (1978, Geman (1980, Yin et al (1988, Silverstein (1995, Bai and Silverstein (1998, Johnstone (2001, Baik and Silverstein (2006, Paul (2007, El Karoui (2007c, Bai et al (2007 and El Karoui (2007a for some of the most statistically relevant results in this area We review some of them now 21 Some results on large dimensional samle covariance matrices Since our main theorems are aroximating theorems, we first wish to state some of the roerties of the objects we will use to aroximate kernel random matrices In what follows, we consider an n data matrix, with, say /n having a finite non-zero limit Most of the results that have been obtained are of two tyes: either they are so-called bulk results and concern essentially the sectral distribution (or loosely seaking the histogram of eigenvalues of the random matrices of interest Or they concern the localization and fluctuation behavior of extreme eigenvalues of these random matrices 211 Sectral distribution results An object of interest in random matrix theory is the sectral distribution of random matrices Let us call l i the decreasingly ordered eigenvalues of our random matrix, and let us assume we are working with an n n matrix, M n The emirical sectral distribution of a n n matrix is the robability measure which uts mass 1/n at each of its eigenvalues In other words, if we call F n this robability measure, we have df n (x = 1 n δ li (x n Note that the histogram of eigenvalues reresent an integrated version of this measure For random matrices, this measure F n is naturally a random measure A key result in the area of covariance matrices is that if we observe iid data vectors X i, with X i = Σ 1/2 Y i, where Σ is a ositive semi-definite matrix and Y i is a vector with iid entries, under weak moment conditions on Y i and assuming that the sectral distribution of Σ has a limit (in the sense of weak convergence of distributions, F n converges to a non-random measure, which we call F i=1 4

5 We call the models X i = Σ 1/2 Y i the standard models of random matrix theory because most results have been derived under these assumtions In articular, various results (Geman (1980, Bai and Silverstein (1998, Bai and Silverstein (1999 show, among many other things, that when the entries of the vector Y have 4 (absolute moments, the largest eigenvalues of the samle covariance matrix X X/n, where X i now occuies the i-th row of the n matrix X, stay close to the endoint of the suort of F A natural question is therefore to try to characterize F Excet in articular situations, it is difficult to do so exlicitly However, it is ossible to characterize a certain transformation of F The tool of choice in this context is the Stieltjes transform of a distribution It is a function defined on C + by the formula, if we call St F the Stieltjes transform of F, St F (z = df (λ λ z, Im [z] > 0 In articular for emirical sectral distributions, we see that, if F n is the sectral distribution of the matrix M n, St Fn (z = 1 n 1 n l i z = 1 n trace ( (M n zid 1 i=1 The imortance of the Stieltjes transform in the context of random matrix theory stems from two facts: on the one hand, it is connected fairly exlicitly to the matrices that are being analyzed On the other hand, ointwise convergence of Stieltjes transform imlies weak convergence of distributions, if a certain mass reservation condition is satisfied This is how a number of bulk results are therefore roved For a clear and self-contained introduction to the connection between Stieltjes transforms and weak convergence of robability measures, we refer the reader to Geronimo and Hill (2003 The result of Marčenko and Pastur (1967, later generalized by Silverstein (1995 for standard random matrix models with non-diagonal covariance, and more recently by eg El Karoui (2007a away from those standard models, is a functional characterization of the limit F If one calls w n (z the Stieltjes transform of the emirical sectral distribution of XX /n, w n (z converges ointwise (and almost surely after Silverstein (1995 to a non-random w(z, which, as a function, is a Stieltjes transform Moreover, w, the Stieltjes transform of F, satisfies the equation, if /n ρ, ρ > 0: 1 λdh(λ w(z = z ρ 1 + λw, where H is the limiting sectral distribution of Σ, assuming that such a distribution exists We note that Silverstein (1995 roved the result under a second moment condition on the entries of Y i From this result, Marčenko and Pastur (1967 derived that in the case where Σ = Id, and hence dh = δ 1, the emirical sectral distribution has a limit whose density is, if ρ 1, f ρ (x = 1 2πρ (b x(x a where a = (1 ρ 1/2 2 and b = (1 + ρ 1/2 2 The difference between the oulation sectral distribution (a oint mass at 1, of mass 1 and the limit of the emirical sectral distribution is quite striking 212 Largest eigenvalues results Another line of work has been focused on the behavior of extreme eigenvalues of samle covariance matrices In articular, Geman (1980 showed, under some moment conditions, that when Σ = Id, l 1 (X X/n (1 + /n 2 almost surely In other words, the largest eigenvalue stays close to the endoint of the limiting sectral distribution of X X/n This result was later generalized in Yin et al (1988, and shown to be true under the assumtion of finite 4th moment only, for data with mean 0 In recent years, fluctuation results have been obtained for this largest eigenvalue, which is of ractical interest in Princial Comonents Analysis (PCA Under Gaussian assumtions, Johnstone (2001 and El Karoui (2003 (see also Forrester (1993 and Johansson (2000 showed that the fluctuations of the largest eigenvalue are Tracy- Widom distributed For the general covariance case, similar results, as well as localization information were x 5

6 recently obtained in El Karoui (2007c We note that the localization information (ie a formula that was discovered in this latter aer was shown to hold for a wide variety of standard random matrix models, through aeal to Bai and Silverstein (1998 We refer the interested reader to Fact 2 in El Karoui (2007c for more information Interesting work has also been done on so-called siked models, where a few oulation eigenvalues are searated from the bulk of them In articular, in the case where all oulation eigenvalues are equal, excet for one that is significantly larger (see Baik et al (2005 for the discovery of an interesting hase transition, Paul (2007 showed, in the Gaussian case, inconsistency of the largest samle eigenvalue, as well as the fact that the angle between the oulation and samle rincial eigenvectors is bounded away from 0 Paul (2007 also obtained fluctuation information about the largest emirical eigenvalue Finally, we note that the same inconsistency of eigenvalue result was also obtained in Baik and Silverstein (2006, beyond the Gaussian case 213 Notations Let us now define some notations and add some clarifications We denote by A the transose of A The matrices we will be working with all have real entries We remind the reader that if A and B are two rectangular matrices, AB and BA have the same eigenvalues, excet for ossibly, a certain number of zeros We will make reeated use of this fact, eg for matrices like X X and XX In the case where A and B are both square, AB and BA have exactly the same eigenvalues We will also need various norms on matrices We will use the so-called oerator norm, which we denote by A 2, which corresonds to the largest singular value of A, ie max i li (A A We occasionally denote the largest singular value of A by σ 1 (A Clearly, for ositive semi-definite matrices, the largest singular value is equal to the largest eigenvalue Finally, we will sometime need to use the Frobenius (or Hilbert- Schmidt norm of a matrix A We denote it by A F By definition, it is simly, because we are working with matrices with real entries, A 2 F = A 2 i,j i,j Further, we use to denote the Hadamard (ie entrywise roduct of two matrices We denote by µ m the m-th moment of a random variable Note that by a slight abuse of notation, we might also use the same notation to refer to the m-th absolute moment (ie E X m of a random variable, but if there is any ambiguity, we will naturally make recise which definition we are using Finally, in the discussion of standard random matrix models that follows, there will be arrays of random variables and as convergence We work with random variables defined on a common robability sace To each ω corresonds an infinite dimensional array of numbers Unless otherwise noted, the n matrices we will use in what follows are the uer-left corner of this array We now turn to the study of kernel random matrices We will show that we can aroximate them by matrices that are closely connected to samle covariance matrices in high-dimensions and, therefore, that a number of the results we just reviewed also aly to them 22 Inner-roduct kernel matrices: f(x i X j/ Theorem 1 (Sectrum of inner roduct kernel random matrices Let us assume that we observe n iid random vectors, X i in R Let us consider the kernel matrix M with entries ( X M i,j = f i X j We assume that a n, ie n/ and /n remain bounded as b Σ is a ositive semi-definite matrix, and Σ 2 = σ 1 (Σ remains bounded in, ie there exists K > 0, such that σ 1 (Σ K, for all c trace (Σ / has a finite limit, ie there exists l R such that lim trace (Σ / = l 6

7 d X i = Σ 1/2 Y i e The entries of Y i, a -dimensional random vector, are iid Also, denoting by Y i (k the k-th entry of Y i, we assume that E (Y i (k = 0, var (Y i (k = 1 and E ( Y i (k 4+ɛ < for some ɛ > 0 (We say that Y i has 4 + ɛ absolute moments f f is a C 1 function in a neighborhood of l = lim trace (Σ / and a C 3 function in a neighborhood of 0 Under these assumtions, the kernel matrix M can (in robability be aroximated consistently in oerator norm, when and n tend to, by the matrix K, where ( K = f(0 + f (0 trace ( Σ f (0 XX + υ Id n, where ( trace (Σ υ = f f(0 f (0 trace (Σ In other words, M K 2 0, in robability, when The advantages of obtaining an oerator norm consistent estimator are many We list some here: Asymtotically, M and K have the same j-largest eigenvalue, for any j: this is simly because for symmetric matrices, if l j is the j-th largest eigenvalue of a matrix, Weyl s inequality (see eg Corollary III26 in Bhatia (1997 imlies that l j (M l j (K M K 2 Hence our result imlies that l j (M l j (K 0 in robability as and n go to infinity The limiting sectral distributions of M and K (if they exist are the same This is a consequence of Lemma 1, 21 below So in articular, when K has a limiting sectral distribution (in the sense of weak convergence of robability measures, the emirical sectral distribution of M converges to that distribution (in the sense of weak convergence of distributions in robability We have subsace consistency for eigensaces corresonding to searated eigenvalues (For a roof, we refer to El Karoui (2007b, Corollary 3 So, when K has eigenvalues that stay searated from the bulk of this matrix s eigenvalues, then M has in robability the same roerty, and the angle between the corresonding eigensaces for K and M go to 0 in robability (Note that the statements we just made assume that both M and K are symmetric, which is the case here The strategy for the roof is the following According to the results of Lemma A-3, the matrix X i X j/ has small entries off the diagonal, whereas on the diagonal, the entries are essentially constant and equal to trace (Σ / Hence, it is natural to try to use the δ-method (ie do a Taylor exansion entry by entry By contrast to standard roblems in Statistics, the fact that we have to erform n 2 of those Taylor exansions means that the second order term is not negligible a riori The roof shows that this aroach can be carried out rigorously, and that, erhas surrisingly, the second order term is not too comlicated to aroximate in oerator norm It is also shown that the third order term lays essentially no role Before we start the roof, we want to mention that we will dro the index in Σ below to avoid cumbersome notations Let us also note, more technically, that an imortant ste of the roof is to show that, when the Y i s have enough moments, they can be treated without much error in sectral results has bounded random variables - the bound deending on the number of moments, n and This then enables us to use concentration results for convex Lischitz functions of indeendent bounded random variables at various imortant oints of the roof and also in Lemma A-3, whose results underly much of the aroach taken here 7

8 Proof First, let us call τ trace (Σ Using Taylor exansions, we can rewrite our kernel matrix as: f(x ix j / = f(0 + f (0X ix j / + f (0 (X 2 ix j / 2 + f (3 (ξ i,j (X 6 ix j / 3, if i j ( f( X i 2 2/ = f(τ + f Xi 2 2 (ξ i,i τ on the diagonal The roof can be searated in different stes We will break the kernel matrix into a diagonal term and an off diagonal term The results of Lemma A-3, after they are shown, will allow us to take care of the diagonal matrix at relatively lost cost So we ostone that art of the analysis to the end of the roof and we first focus on the off-diagonal matrix In what follows, we call second order term the matrix A with entries A i,j = f (0 (X 2 ix j / 2 1 i j We call third order term the matrix B with entries B i,j = f (3 (ξ i,j (X 6 ix j / 3 1 i j The off-diagonal matrix is the sum A + B A Study of the off-diagonal matrix Truncation and centralization Following the arguments of Lemma 22 in Yin et al (1988, we see that because we have assumed that we have 4 + ɛ absolute moments, and n, the array Y = Y 1 i n,1 j is almost surely equal to the array Ỹ of same dimensions, with Ỹ i,j = Y i,j 1 Yi,j B, where B = 1/2 δ, and δ > 0 We will therefore carry out the analysis on this Ỹ array Note that most of the results we will rely on require vectors of iid entries with mean 0 Of course, Ỹ i,j has in general a mean different from 0 In other words, if we call µ = E (Ỹi,j, we need to show that we do not lose anything in oerator norm by relacing Ỹi s by U i s with U i = Ỹi µ1 Note that, as seen in Lemma A-3, by lugging in t = 1/2 δ in the notation of this lemma, which corresonds to the 4 + ɛ moment assumtion here, we have µ 3/2 δ Now let us call S the matrix XX /, excet that its diagonal is relaced by zeros From Yin et al (1988, and the fact that n/ stays bounded, we know that XX / 2 σ 1 (Σ Y Y 2 / stays bounded Using Lemma A-3, we see that the diagonal of XX / stays bounded as in oerator norm Therefore, S 2 is bounded as Now, as in the roof of Lemma A-3, we have ( 1 ΣU j S i,j = U i ΣU j + µ + 1 ΣU i + µ 2 1 Σ1 U i ΣU j + R i,j as Note that this equality is true as only because it involves relacing Y by Ỹ The roof of Lemma A-3 shows that R i,j µ 2σ 1/2 1 (Σ(σ 1/2 1 (Σ + δ/2 + µ 2 σ 1 (Σ as We conclude that, for some constant C, R 2 F Cn 2 µ 2 Cn 2 3 2δ as 8

9 Therefore R 2 0 as In other words, if we call S U the matrix with i, j entry U i ΣU j/ off the diagonal and 0 on the diagonal, S S U 2 0 as Now it is a standard result on Hadamard roducts (see for instance, Bhatia (1997, Problem I613, or Horn and Johnson (1994, Theorems 551 and 5515 that for two matrices A and B, A B 2 A 2 B 2 Since the Hadamard roduct is commutative, we have We conclude that S S S U S U = (S + S U (S S U S S S U S U 2 S S U 2 ( S 2 + S U 2 0 as, since S S U 2 0 as, and S 2 and hence S U 2 stay bounded, as The conclusion of this study is that to aroximate the second order term in oerator norm, it is enough to work with S U and not S, and hence, very imortantly, with bounded random variables with zero mean Further, the roof of Lemma A-3 makes clear that σu 2, the variance of the U i,j s, goes to 1, the variance of the Y i,j s, very fast So if we can aroximate the matrix with (i, j-entry U i ΣU j/(σu 2 consistently in oerator norm by a matrix whose oerator norm is bounded, this same matrix will constitute an oerator norm aroximation of U i ΣU j/ In other words, we can assume that, when working with matrices of dimension n, the random variables we will be working with have variance 1 without loss of generality, and that they have mean 0 and are bounded by B, B deending on and going to infinity Control of the second order term We now focus on aroximating in oerator norm the matrix with (i, j-th entry f (0 (X 2 ix j / 2 1 i j As we just exlained, we assume from now on in all the work concerning the second order term that the vectors Y i have mean 0, and that their entries have variance 1 and are bounded by B = 1/2 δ This is because we just saw that relacing Y i by U i /σ U would not change ( as and asymtotically the oerator norm of the matrix to be studied We note that to make clear that the truncation deends on, we might have wanted to use the notation Y ( i, but since there will be no ambiguity in the roof, we chose to use the less cumbersome notation Y i The control of the second order term turns out to be the most delicate art of the analysis, and the only lace where we need the assumtion that X i = Σ 1/2 Y i Let us call W the matrix with entries Note that, when i j, W i,j = { (X i X j 2, if i j 2 0, if i = j E (W i,j = E ( trace ( X ix j X jx i / 2 = E ( trace ( X j X jx i X i / 2 = trace ( Σ 2 / 2 Because we assume that trace (Σ / has a finite limit, and n/ stays bounded away from 0, we see that the matrix E (W has a largest eigenvalue that, in general, does not go to 0 Note also that under our assumtions, E (W i,j = O(1/ Our aim is to show that W can be aroximated in oerator norm by this constant matrix So let us consider the matrix W with entries W i,j = { (X i X j 2 trace ( Σ 2 / 2, if i j 2 0, if i = j Simle comutations show that the exected Frobenius norm squared of this matrix does ( not go to 0 Hence more subtle arguments are needed to control its oerator norm We will show that E trace ( W 4 ( goes to zero, which imlies that E W 4 2 goes to zero, because W is real symmetric The elements contributing to trace ( W 4 are generally of the form W i,j Wj,k Wk,l Wl,i We are going to study these terms according to how many indices are equal to each other 9

10 i Terms involving 4 different indices: i j k l We first focus on the case where all these indices (i, j, k, l are different Recall that X i = Σ 1/2 Y i, where Y i has iid entries We want to comute E ( Wi,j Wj,k Wk,l Wl,i, so it is natural to focus first on Now, note that E ( Wi,j Wj,k Wk,l Wl,i Y i, Y k W i,j = 1 2 { Y i ΣY j Y j ΣY i trace ( Σ 2} = 1 2 { Y i Σ(Y j Y j IdΣY i + trace ( Σ 2 (Y i Y i Id } Hence, calling we have M j Y j Y j Id, 4 Wi,j Wj,k = (Y i ΣM j ΣY i Y k ΣM jσy k + (Y i ΣM j ΣY i trace ( Σ 2 M k + (Y k ΣM jσy k trace ( Σ 2 M i + trace ( Σ 2 M i trace ( Σ 2 M k Now, of course, we have E (M j = E (M j Y i, Y k = 0 Hence, 4 E ( Wi,j Wj,k Y i, Y k = (Y i ΣE ( M j ΣY i Y k ΣM j Y i, Y k ΣYk + trace ( Σ 2 ( M i trace Σ 2 M k ( If M is a deterministic matrix, we have, since E Y j Y j = Id, E (M j MM j = E ( Y j Y j MY j Y j M If we now use Lemma A-1, and in articular Equation (A-1, age 28, we finally have, recalling that here σ 2 = 1, E (M j MM j = (M + M + (µ 4 3diag(M + trace (M Id M = M + (µ 4 3diag(M + trace (M Id In the case of interest here, we have M = ΣY i Y k Σ, and the exectation is to be understood conditionally on Y i, Y k, but because we have assumed that the indices are different, and the Y m s are indeendent, we can do the comutation of the conditional exectation as if M were deterministic Therefore, we have (Y i ΣE ( M j ΣY i Y k ΣM j Y i, Y k ΣYk = Y i Σ [ ΣY k Y i Σ + (µ 4 3diag(ΣY i Y k Σ + (Y k Σ2 Y i Id ] ΣY k = [ (Y i Σ 2 Y k 2 + (µ 4 3Y i Σdiag(ΣY i Y k ΣΣY k + (Y i Σ 2 Y k 2] Naturally, we have E ( Wi,j Wj,k Y i, Y k = E ( Wk,l Wl,i Y i, Y k, and therefore, by using roerties of conditional exectation, since all the indices are different, ( [2(Y 8 E ( Wi,j Wj,k Wk,l Wl,i = E i Σ 2 Y k 2 + (µ 4 3Y i Σdiag(ΣY i Y k ΣΣY k + trace ( Σ 2 ( M i trace Σ 2 ] 2 M k By convexity, we have (a + b + c 2 3(a 2 + b 2 + c 2, so to control the above exression, we just need to control the square of each of the terms aearing in it In other words, we need to understand the terms T 1 = E ( (Y i Σ 2 Y k 4, ( [Y T 2 = E i Σdiag(ΣY i Y k ΣΣY ] 2 k, and ( [trace ( T 3 = E Σ 2 ( M i trace Σ 2 ] 2 M k 10

11 Study of T 1 Let us start by the term T 1 = E ( (Y i Σ2 Y k 4 A simle re-writing shows that (Y i Σ 2 Y k 4 = Y i Σ 2 Y k Y k Σ2 Y i Y i Σ 2 Y k Y k Σ2 Y i Using Equation (A-1 in Lemma A-1, we therefore have, using the fact that Σ 2 Y i Y i Σ2 is symmetric, E ( (Y i Σ 2 Y k 4 Y i = Y i Σ 2 [ 2Σ 2 Y i Y i Σ 2 + (µ 4 3diag(Σ 2 Y i Y i Σ 2 + trace ( Σ 2 Y i Y i Σ 2 Id ] Σ 2 Y i = 3(Y i Σ 4 Y i 2 + (µ 4 3Y i Σ 2 diag(σ 2 Y i Y i Σ 2 Σ 2 Y i Finally, we have, using Equation (A-2 in Lemma A-1, Now, we have E ( (Y i Σ 2 Y k 4 = 3 [ 2trace ( Σ 4 + (trace ( Σ (µ 4 3trace ( Σ 4 Σ 4] + (µ 4 3E ( Y i Σ 2 diag(σ 2 Y i Y i Σ 2 Σ 2 Y i Y i Σ 2 diag(σ 2 Y i Y i Σ 2 Σ 2 Y i = trace ( Σ 2 Y i Y i Σ 2 diag(σ 2 Y i Y i Σ 2 = trace ( Σ 2 Y i Y i Σ 2 Σ 2 Y i Y i Σ 2 Calling v i = Σ 2 Y i, we note that the matrix whose trace is taken is (v i v i (v iv i = (v i v i (v i v i (see Horn and Johnson (1990, 458 or Horn and Johnson (1994, 307 Hence, Y i Σ 2 diag(σ 2 Y i Y i Σ 2 Σ 2 Y i = v i v i 2 2 Now let us call m k the k-th column of the matrix Σ 2 Using the fact that Σ 2 is symmetric, we see that the k-th entry of the vector v i is v i (k = m k Y i So v i (k 4 = Y i m km k Y iy i m km k Y i Calling M k = m k m k, we see using Equation (A-2 in Lemma A-1 that E ( v i (k 4 = 2trace ( M 2 k + [trace (Mk ] 2 + (µ 4 3trace (M k M k Using the definition of M k, we finally get that E ( v i (k 4 = 3 m k (µ 4 3 m k m k 2 2 Now, note that if C is a generic matrix and C k is its k th column, denoting by e k the k-th vector of the canonical basis, we have C k = Ce k and hence C k 2 2 = e k C Ce k σ1 2(C, where σ 1(C is the largest singular value of C So in articular, if we call λ 1 (D the largest eigenvalue of a ositive semi-definite matrix D, we have m k 4 2 λ 1(Σ 4 m k 2 2 After recalling the definition of m k, and using the fact that k m k m k 2 2 = Σ2 Σ 2 2 F, we deduce that E ( v i v i 2 2 = 3 m k (µ 4 3 m k m k 2 2 k k Therefore, we can conclude that 3λ 1 (Σ 4 trace ( Σ 4 + (µ 4 3trace ( [Σ 2 Σ 2] 2 E ( (Y i Σ 2 Y k 4 3λ 1 (Σ 4 trace ( Σ 4 + (µ 4 3trace ( [Σ 2 Σ 2] 2 Now recall that, according to Theorem 5519 in Horn and Johnson (1994, if C and D are ositive semidefinite matrices, λ(c D w d(c λ(d, where λ(d is the vector of decreasingly ordered eigenvalues of D, and d(c denotes the vector of decreasingly ordered diagonal entries of C (because all the matrices are ositive semidefinite, their eigenvalues are their singular values Here w denotes weak (submajorization In our case, of course, C = D = Σ 2 Using the results of Examle II35 (iii in Bhatia (1997, with the function φ(x = x 2, we see that Finally, we have trace ( (Σ 2 Σ 2 2 = λ 2 i (Σ 2 Σ 2 d 2 i (Σ 2 λ 2 i (Σ 2 λ 1 (Σ 4 trace ( Σ 4 This bounds the first term, T 1, in our uer bound T 1 = E ( (Y i Σ 2 Y k 4 (3 + µ 4 3 λ 1 (Σ 4 trace ( Σ 4 (1 11

12 ( [trace ( Study of T 3 Let us now turn to the third term, T 3 = E Σ 2 ( M i trace Σ 2 ] 2 M k We remind the ( [trace reader that M i = Y i Y i Id By indeendence of Y ( i and Y k, it is enough to understand E Σ 2 ] 2 M i Note that ( [trace ( E Σ 2 ] 2 M i ( [Y = E i Σ 2 Y i trace ( Σ 2] 2 = E ( Y i Σ 2 Y i Y i Σ 2 ( Y i trace Σ 2 2 Using Equation (A-2 in Lemma A-1, we conclude that ( [trace ( E Σ 2 ] 2 M i = 2trace ( Σ 4 + (µ 4 3trace ( Σ 2 Σ 2 Using the fact that we know the diagonal of Σ 2 Σ 2, we conclude that, ( [trace ( T 3 = E Σ 2 ] 2 [ ( M i trace Σ 2 ] 2 M k { 2trace ( Σ 4 + µ 4 3 λ 1 (Σ 2 trace ( Σ 2} 2 (2 So we have an uer bound on T 3 ( Study of T 2 Finally, let us turn to the middle term, T 2 = E [Y i Σdiag(ΣY iy k ΣΣY k] 2 Before we square it, the argument of the exectation has the form Y i Σdiag(ΣY ky i ΣΣY k Call u k = ΣY k Making the same comutations as above, we find that Y i Σdiag(ΣY k Y i ΣΣY k = trace ( diag(σy k Y i ΣY k Y i Σ = trace ( (ΣY k Y i Σ (ΣY k Y i Σ = trace ( (u k u i (u k u i = trace ( (u k u k (u i u i = (u i u i (u k u k We deduce, using indeendence and elementary roerties of inner roducts that ( [Y E i Σdiag(ΣY k Y i ] 2 ΣΣY k E ( u i u i 2 ( 2 E uk u k 2 2 Note that to arrive at Equation (1, we studied exressions similar to E ( u i u i 2 2 So we can similarly conclude that ( [Y T 2 = E i Σdiag(ΣY k Y i ] 2 ΣΣY k { (3 + µ 4 3 λ 1 (Σ 2 trace ( Σ 2} 2 (3 With our assumtions, the terms (1, (2 and (3 are O( 2 Note that in the comutation of the trace, there are O(n 4 such terms Finally, note that the exectation of interest to us corresonds to the sum of the three quadratic terms divided by 8 So the total contribution of these terms is in exectation O( 2 This takes care of the contribution of the terms involving four different indices, as it shows that 0 E W i,j Wj,k Wk,l Wl,i = O( 2 i j k l ii Terms involving three different indices: i j k Note that because W i,i = 0, terms involving 3 different indices with a non-zero contribution are necessarily of the form ( W i,j 2 ( W i,k 2, since terms with a cycle of length 3 all involve a term of the form W i,i and hence contribute 0 Let us now focus on those terms, assuming that j k Note that we have O(n 3 such terms, and that it is enough to focus on the Wi,j 2 W i,k 2, since the contribution of the other terms is, in exectation, of order 1/4 (with our assumtions trace ( Σ 2 / 2 = O(1/, and because we have only n 3 terms in the sum, this extra contribution is ( [ ( 2, asymtotically zero Now, we clearly have E Wi,j 2 W i,k 2 Y i = E Wi,j i] 2 Y by conditional indeendence ( of the two terms The comutation of E Wi,j 2 Y i is similar to the ones we have made above, and we have 4 E ( W 2 i,j Y i = 2(Y i Σ 2 Y i 2 + (µ 4 3Y i Σdiag(ΣY i Y i ΣΣY i + (trace ( ΣY i Y i Σ 2 12

13 Using the fact that K i = ΣY i Y i Σ is ositive semidefinite, and hence its diagonal entries are non-negative, we have trace (K i K i (trace (K i 2, we conclude that Hence, 4 E ( W 2 i,j Y i (3 + κ4 3 (Y i Σ 2 Y i 2 (3 + κ 4 3 σ 1 (Σ 4 Y i 4 2 E ( Wi,jW 2 i,k (3 + κ σ 1 (Σ 8 Y i 8 2 Now, the alication F which takes a vector and returns its Euclidean norm is trivially a convex 1- Lischitz function, with resect to Euclidean norm Because the entries of Y i are bounded by B, we see that, according to Corollary 410 in Ledoux (2001, F (Y i = Y i 2 satisfies a concentration inequality, namely, for r > 0, P ( Y i 2 m F > r 4 ex( r 2 /16B 2, where m F is a median of F (Y i = Y i 2 (hence m F is a deterministic quantity A simle integration (see for instance the roof of Proosition 19 in Ledoux (2001, and change the ower from 2 to 8 then shows that E ( Y i 2 m F 8 = O(B 8 Now, we know, according to Proosition 19 in Ledoux (2001, that if µ F is the mean of F (Y i, ie µ F = E ( Y i 2, µ F exists and m F µ F = O(B Since µ 2 F µ F 2 = E ( Y i 2 2 =, we conclude that, if C denotes a generic constant that may change from dislay to dislay, E ( Y i 8 2 E ( Yi 2 m F + m F (E ( Y i 2 m F 8 + m 8 F C(E ( Y i 2 m F 8 + m F µ F 8 + µ 8 F C(B Now, our original assumtion about the number of absolute moments of the random variables of interest imly that B = O( 1/2 δ Consequently, E ( Y i 8 2 = O( 4 Therefore, and Hence, we also have i i E ( W 2 i,jw 2 i,k = O( 4 j i,k i,j k j i,k i,j k E ( W 2 i,jw 2 i,k = O( E ( W i,j W i,k = O( 1 iii ( Terms involving two different indices: i j The last terms we have to focus on to control E trace ( W 4 are of the form W i,j 4 Note that we have n2 terms like this Since by convexity, (a + b 4 8(a 4 +b 4, we see that it is enough to understand the contribution of Wi,j 4 to show that i,j ( W E 4 tends to zero Now, let us call for a moment v = ΣY i and u = Y j The quantity of interest to us is basically of the form E ( (u v 8 Let us do comutations conditional on v We note that since the entries of u are indeendent and have mean 0, in the exansion of (u v 8, the only terms that will contribute a non-zero quantity to the exectation have entries of u raised to a ower greater than 2 We can decomose the sum reresenting E ( (u v 8 v into subterms, according to what owers of the terms are involved There are 6 terms: (2,2,2,2 (ie all terms are raised to the ower 2, (3,3,2 (ie two terms are raised to the ower 3, and one to the ower 2, (4,2,2, (4,4, (5,3, (6,2 and (8 For instance the subterm corresonding to (2,2,2,2 is, before taking exectations, i 1 i 2 i 3 i 4 u 2 i 1 u 2 i 2 u 2 i 3 u 2 i 4 (v i1 v i2 v i3 v i4 2 i,j 13

14 After taking exectations conditional on v, we see that it is obviously non-negative and contributes (σ 2 4 (v i1 v i2 v i3 v i4 2 ( v 2 i 4 = (Y i Σ 2 Y i 4 σ 1 (Σ 8 Y i 8 2 i 1 i 2 i 3 i 4 Note that we just saw that E ( Y i 8 2 = O( 4 in our context Similarly, the term (3, 3, 2 will contribute In absolute value, this term is less than µ 2 3σ 2 i 1 i 2 i 3 v 3 i 1 v 3 i 2 v 2 i 3 µ 2 3σ 2 ( v i 3 2 ( v 2 i Now, note that if z is such that z 2 = 1, we have, for 2, z i zi 2 = 1 Alied to z = v/ v 2, we conclude that v i v 2 Consequently, the term (3,3,2 contributes in absolute value less than µ 2 3σ 2 v 8 2 The same analysis can be reeated for all the other terms, which are all found to be less than, v 8 2 times the moments of u involved Because we have assumed that our original random variables had 4+ɛ absolute moments, the moments of order less than 4 cause no roblem The moments of order higher than 4, say 4 + k, can be bounded by µ 4 B k Consequently, we see that E ( Wi,j 4 ( ( = E E W 4 i,j Y i CB 4 E Since we have n 2 such terms, we see that E ( Wi,j 4 0 as Using our earlier convexity remark, we finally conclude that 4 E ( W i,j 0 as i j i j ( Yi 8 = O(B/ 4 4 = O( (2+4δ iv Second order term: combining all the elements We have therefore established control of the second order term and seen that the largest singular value of W goes to 0 in robability, using Chebyshev s inequality Note that we have also shown that the oerator norm of W is bounded in robability and that W trace ( Σ 2 2 (11 Id 2 0 in robability Control of the third order term We note that the third order term is of the form f (3 (ξ i,j X i X j W i,j According to Lemma B-1, if M is a real symmetric matrix with non-negative entries, and E is a symmetric matrix such that max i,j E i,j = ζ, then σ 1 (E M ζσ 1 (M Note that W is real symmetric matrix with non-negative entries So all we have to show to rove that the third order term goes to zero in oerator norm is that max i j X i X j/ goes to 0, because we have just established that W 2 remains bounded in robability We are going to make use of Lemma A-3, 31 in the Aendix In our setting, we have B = 1/2 δ, or 2/m = 1/2 δ The lemma imlies, for instance, that max i j X ix j / δ log( as 14 8

15 So max i j X i X j/ 0 as Note that this imlies that max i j ξ i,j 0 as Since we have assumed that f (3 exists and is continuous and hence bounded in a neighborhood of 0, we conclude that max f (3 (ξ i,j X ix j / = o( δ/2 as i,j If we call E the matrix with entry E i,j = f (3 (ξ i,j X i X j/ off-the diagonal and 0 on the diagonal, we see that E satisfies the conditions ut forth in our discussion earlier in this section and we conclude that E W 2 max E i,j W 2 = o( δ/2 as i,j Hence, the oerator norm of the third order term goes to 0 almost surely (To maybe clarify our arguments, let us reeat that we analyzed the second order term by relacing the Y i s by, in the notation of the truncation and centralization discussion, U i Let us call W U = S U S U, again using notation introduced in the truncation and centralization discussion As we saw, W W U 2 0 as, so showing, as we did, that W U 2 remains bounded ( as imlies that W 2 does, too, and this is the only thing we need in our argument showing the control of the third order term B Control of the diagonal term The roof here is divided into two arts First, we show that the error term coming from the first order exansion of the diagonal is easily controlled Then we show that the terms added when relacing the off-diagonal matrix by XX / + trace ( Σ 2 / 2 11 can also be controlled Recall the notation τ = trace (Σ / Errors induced by diagonal aroximation Note that Lemma A-3 guarantees that for all i, ξ i,i τ δ/2, as Because we have assumed that f is continuous and hence bounded in a neighborhood of τ, we conclude that f (ξ i,i is uniformly bounded in Now Lemma A-3 also guarantees that max i X i 2 2 τ δ as Hence, the diagonal matrix with entries f( X i 2 2 / can be aroximated consistenly in oerator norm by f(τid as Errors induced by off-diagonal aroximation When we relace the off-diagonal matrix by f (0XX / + [f(0 + f (0trace ( Σ 2 /2 2 ]11, we add a diagonal matrix with (i, i entry f(0 + f (0 X i 2 2 / + f (0trace ( Σ 2 /2 2, which we need to subtract eventually We note that 0 trace ( Σ 2 / 2 σ1 2(Σ/ 0 when σ 1(Σ remains bounded in So this term does not create any roblem Now, we just saw that the diagonal matrix with entries X i 2 2 / can be consistently aroximated in oerator norm by (trace (Σ / Id So the diagonal matrix with (i, i entry f(0 + f (0 X i 2 2 / + f (0trace ( Σ 2 /2 2 can be aroximated consistently in oerator norm by (f(0 + f (0trace (Σ /Id as This finishes the roof 23 Kernel random matrices of the tye f( X i X j 2 2/ As is to be exected, the roerties of such matrices can be deduced from the study of inner roduct kernel matrices, with a little bit of extra work We need to slightly modify the distributional assumtions under which we work, and consider the case where we have 5+ɛ absolute moments for the entries of Y i We also need to assume that f is regular is the neighborhood of different oints Otherwise, the assumtions are the same as that of Theorem 1 We have the following theorem: Theorem 2 (Sectrum of Euclidean distance kernel matrices Consider the n n kernel matrix M with entries ( Xi X j 2 2 M i,j = f Let us call trace (Σ τ = 2 15

16 Let us call ψ the vector with i-th entry ψ i = X i 2 2 / trace (Σ / Suose that the assumtions of Theorem 1 hold, but that conditions e and f are relaced by e The entries of Y i, a -dimensional random vector, are iid Also, denoting by Y i (k the k-th entry of Y i, we assume that E (Y i (k = 0, var (Y i (k = 1 and E ( Y i (k 5+ɛ < for some ɛ > 0 (We say that Y i has 5 + ɛ absolute moments f f is C 3 in a neighborhood of τ Then M can be aroximated consistently in oerator norm (and in robability by the matrix K, defined by ] K = f(τ11 + f (τ [1ψ + ψ1 2 XX + f (τ [1(ψ ψ + (ψ ψ1 + 2ψψ + 4 trace ( Σ 2 ] υ Id, υ = f(0 + τf (τ f(τ In other words, M K 2 0 in robability Proof Note that here the diagonal is just f(0id and it will cause no trouble The work therefore focuses on the off-diagonal matrix In what follows, we call τ = 2 trace(σ Let us define and A i,j = X i X j 2 2 S i,j = X i X j τ, With these notations, we have, off the diagonal, ie when i j, by a Taylor exansion: M i,j = f(τ + [A i,j 2S i,j ] f (τ [A i,j 2S i,j ] 2 f (τ f (3 (ξ i,j [A i,j 2S i,j ] 3 We note that the matrix A with entries A i,j is a rank 2 matrix As a matter of fact, it can be written, if ψ is the vector with entries ψ i = X i 2 2 τ/2, A = 1ψ + ψ1 Using the well-known identity (see eg Gohberg et al (2000, Chater 1, Theorem 32 ( det(i + uv + vu 1 + u = det v u 2 2 v u, v we see immediately that the non-zero eigenvalues of A are 1 ψ ± n ψ 2 After these reliminary remarks, we are ready to start the roof er se Truncation and centralization Since we assume 5 + ɛ absolute moments, we see, using Lemma 22 in Yin et al (1988, that we can truncate the Y i s at level B = 2/5 δ, with δ > 0 and as not change the data matrix We then need to centralize the vectors truncated at 2/5 δ Note that because we work with X i X j = Σ 1/2 (Y i Y j centralization creates absolutely no roblem here, since it is absorbed in the difference So in what follows we can assume without loss of generality that we are working with vectors X i = Σ 1/2 Y i, where the entries of Y i are bounded by 2/5 δ and E (Y i = 0 The issue of variance 1 is addressed as before, so we can assume that the entries of Y i have variance 1 Concentration of X i X j 2 2 / By lugging-in the results of Corollary A-2, with 2/m = 2/5 δ, we get that max i j X i X j trace (Σ 16 log( 1/10 δ

On information plus noise kernel random matrices

On information plus noise kernel random matrices On information lus noise kernel random matrices Noureddine El Karoui Deartment of Statistics, UC Berkeley First version: July 009 This version: February 5, 00 Abstract Kernel random matrices have attracted

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A Dahleh George Verghese Deartment of Electrical Engineering and Comuter Science Massachuasetts Institute of Technology c Chater Matrix Norms

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Understanding Big Data Spectral Clustering

Understanding Big Data Spectral Clustering Understanding Big Data Sectral Clustering Romain Couillet, Florent Benaych-Georges To cite this version: Romain Couillet, Florent Benaych-Georges. Understanding Big Data Sectral Clustering. IEEE 6th International

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 000-9939XX)0000-0 ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS WILLIAM D. BANKS AND ASMA HARCHARRAS

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

Sharp gradient estimate and spectral rigidity for p-laplacian

Sharp gradient estimate and spectral rigidity for p-laplacian Shar gradient estimate and sectral rigidity for -Lalacian Chiung-Jue Anna Sung and Jiaing Wang To aear in ath. Research Letters. Abstract We derive a shar gradient estimate for ositive eigenfunctions of

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

HARMONIC EXTENSION ON NETWORKS

HARMONIC EXTENSION ON NETWORKS HARMONIC EXTENSION ON NETWORKS MING X. LI Abstract. We study the imlication of geometric roerties of the grah of a network in the extendibility of all γ-harmonic germs at an interior node. We rove that

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Positive decomposition of transfer functions with multiple poles

Positive decomposition of transfer functions with multiple poles Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

CSE 599d - Quantum Computing When Quantum Computers Fall Apart

CSE 599d - Quantum Computing When Quantum Computers Fall Apart CSE 599d - Quantum Comuting When Quantum Comuters Fall Aart Dave Bacon Deartment of Comuter Science & Engineering, University of Washington In this lecture we are going to begin discussing what haens to

More information

The analysis and representation of random signals

The analysis and representation of random signals The analysis and reresentation of random signals Bruno TOÉSNI Bruno.Torresani@cmi.univ-mrs.fr B. Torrésani LTP Université de Provence.1/30 Outline 1. andom signals Introduction The Karhunen-Loève Basis

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

Stochastic integration II: the Itô integral

Stochastic integration II: the Itô integral 13 Stochastic integration II: the Itô integral We have seen in Lecture 6 how to integrate functions Φ : (, ) L (H, E) with resect to an H-cylindrical Brownian motion W H. In this lecture we address the

More information

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE GUSTAVO GARRIGÓS ANDREAS SEEGER TINO ULLRICH Abstract We give an alternative roof and a wavelet analog of recent results

More information

An Inverse Problem for Two Spectra of Complex Finite Jacobi Matrices

An Inverse Problem for Two Spectra of Complex Finite Jacobi Matrices Coyright 202 Tech Science Press CMES, vol.86, no.4,.30-39, 202 An Inverse Problem for Two Sectra of Comlex Finite Jacobi Matrices Gusein Sh. Guseinov Abstract: This aer deals with the inverse sectral roblem

More information

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes Infinite Series 6.2 Introduction We extend the concet of a finite series, met in Section 6., to the situation in which the number of terms increase without bound. We define what is meant by an infinite

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

CERIAS Tech Report The period of the Bell numbers modulo a prime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education

CERIAS Tech Report The period of the Bell numbers modulo a prime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education CERIAS Tech Reort 2010-01 The eriod of the Bell numbers modulo a rime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education and Research Information Assurance and Security Purdue University,

More information

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

A Numerical Radius Version of the Arithmetic-Geometric Mean of Operators

A Numerical Radius Version of the Arithmetic-Geometric Mean of Operators Filomat 30:8 (2016), 2139 2145 DOI 102298/FIL1608139S Published by Faculty of Sciences and Mathematics, University of Niš, Serbia vailable at: htt://wwwmfniacrs/filomat Numerical Radius Version of the

More information

Chapter 7: Special Distributions

Chapter 7: Special Distributions This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli

More information

MA3H1 TOPICS IN NUMBER THEORY PART III

MA3H1 TOPICS IN NUMBER THEORY PART III MA3H1 TOPICS IN NUMBER THEORY PART III SAMIR SIKSEK 1. Congruences Modulo m In quadratic recirocity we studied congruences of the form x 2 a (mod ). We now turn our attention to situations where is relaced

More information

Sets of Real Numbers

Sets of Real Numbers Chater 4 Sets of Real Numbers 4. The Integers Z and their Proerties In our revious discussions about sets and functions the set of integers Z served as a key examle. Its ubiquitousness comes from the fact

More information

Lecture 10: Hypercontractivity

Lecture 10: Hypercontractivity CS 880: Advanced Comlexity Theory /15/008 Lecture 10: Hyercontractivity Instructor: Dieter van Melkebeek Scribe: Baris Aydinlioglu This is a technical lecture throughout which we rove the hyercontractivity

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

CR extensions with a classical Several Complex Variables point of view. August Peter Brådalen Sonne Master s Thesis, Spring 2018

CR extensions with a classical Several Complex Variables point of view. August Peter Brådalen Sonne Master s Thesis, Spring 2018 CR extensions with a classical Several Comlex Variables oint of view August Peter Brådalen Sonne Master s Thesis, Sring 2018 This master s thesis is submitted under the master s rogramme Mathematics, with

More information

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems Int. J. Oen Problems Comt. Math., Vol. 3, No. 2, June 2010 ISSN 1998-6262; Coyright c ICSRS Publication, 2010 www.i-csrs.org Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various

More information

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are Numer. Math. 68: 215{223 (1994) Numerische Mathemati c Sringer-Verlag 1994 Electronic Edition Bacward errors for eigenvalue and singular value decomositions? S. Chandrasearan??, I.C.F. Isen??? Deartment

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

On a class of Rellich inequalities

On a class of Rellich inequalities On a class of Rellich inequalities G. Barbatis A. Tertikas Dedicated to Professor E.B. Davies on the occasion of his 60th birthday Abstract We rove Rellich and imroved Rellich inequalities that involve

More information

Journal of Mathematical Analysis and Applications

Journal of Mathematical Analysis and Applications J. Math. Anal. Al. 44 (3) 3 38 Contents lists available at SciVerse ScienceDirect Journal of Mathematical Analysis and Alications journal homeage: www.elsevier.com/locate/jmaa Maximal surface area of a

More information

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM JOHN BINDER Abstract. In this aer, we rove Dirichlet s theorem that, given any air h, k with h, k) =, there are infinitely many rime numbers congruent to

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

LORENZO BRANDOLESE AND MARIA E. SCHONBEK LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

Frobenius Elements, the Chebotarev Density Theorem, and Reciprocity

Frobenius Elements, the Chebotarev Density Theorem, and Reciprocity Frobenius Elements, the Chebotarev Density Theorem, and Recirocity Dylan Yott July 30, 204 Motivation Recall Dirichlet s theorem from elementary number theory. Theorem.. For a, m) =, there are infinitely

More information

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse-Minkowski Theorem rovides a characterization of the rational quadratic forms. What follows is a roof of the Hasse-Minkowski

More information

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes Infinite Series 6. Introduction We extend the concet of a finite series, met in section, to the situation in which the number of terms increase without bound. We define what is meant by an infinite series

More information

1 Probability Spaces and Random Variables

1 Probability Spaces and Random Variables 1 Probability Saces and Random Variables 1.1 Probability saces Ω: samle sace consisting of elementary events (or samle oints). F : the set of events P: robability 1.2 Kolmogorov s axioms Definition 1.2.1

More information

Real Analysis 1 Fall Homework 3. a n.

Real Analysis 1 Fall Homework 3. a n. eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually

More information

A construction of bent functions from plateaued functions

A construction of bent functions from plateaued functions A construction of bent functions from lateaued functions Ayça Çeşmelioğlu, Wilfried Meidl Sabancı University, MDBF, Orhanlı, 34956 Tuzla, İstanbul, Turkey. Abstract In this resentation, a technique for

More information

Lecture: Condorcet s Theorem

Lecture: Condorcet s Theorem Social Networs and Social Choice Lecture Date: August 3, 00 Lecture: Condorcet s Theorem Lecturer: Elchanan Mossel Scribes: J. Neeman, N. Truong, and S. Troxler Condorcet s theorem, the most basic jury

More information

8.7 Associated and Non-associated Flow Rules

8.7 Associated and Non-associated Flow Rules 8.7 Associated and Non-associated Flow Rules Recall the Levy-Mises flow rule, Eqn. 8.4., d ds (8.7.) The lastic multilier can be determined from the hardening rule. Given the hardening rule one can more

More information

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional

More information

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION M. Jabbari Nooghabi, Deartment of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad-Iran. and H. Jabbari

More information

Boundary problems for fractional Laplacians and other mu-transmission operators

Boundary problems for fractional Laplacians and other mu-transmission operators Boundary roblems for fractional Lalacians and other mu-transmission oerators Gerd Grubb Coenhagen University Geometry and Analysis Seminar June 20, 2014 Introduction Consider P a equal to ( ) a or to A

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

Commutators on l. D. Dosev and W. B. Johnson

Commutators on l. D. Dosev and W. B. Johnson Submitted exclusively to the London Mathematical Society doi:10.1112/0000/000000 Commutators on l D. Dosev and W. B. Johnson Abstract The oerators on l which are commutators are those not of the form λi

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS In this section we will introduce two imortant classes of mas of saces, namely the Hurewicz fibrations and the more general Serre fibrations, which are both obtained

More information

Participation Factors. However, it does not give the influence of each state on the mode.

Participation Factors. However, it does not give the influence of each state on the mode. Particiation Factors he mode shae, as indicated by the right eigenvector, gives the relative hase of each state in a articular mode. However, it does not give the influence of each state on the mode. We

More information

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES Khayyam J. Math. DOI:10.22034/kjm.2019.84207 TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES ISMAEL GARCÍA-BAYONA Communicated by A.M. Peralta Abstract. In this aer, we define two new Schur and

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

SOME TRACE INEQUALITIES FOR OPERATORS IN HILBERT SPACES

SOME TRACE INEQUALITIES FOR OPERATORS IN HILBERT SPACES Kragujevac Journal of Mathematics Volume 411) 017), Pages 33 55. SOME TRACE INEQUALITIES FOR OPERATORS IN HILBERT SPACES SILVESTRU SEVER DRAGOMIR 1, Abstract. Some new trace ineualities for oerators in

More information

THE SET CHROMATIC NUMBER OF RANDOM GRAPHS

THE SET CHROMATIC NUMBER OF RANDOM GRAPHS THE SET CHROMATIC NUMBER OF RANDOM GRAPHS ANDRZEJ DUDEK, DIETER MITSCHE, AND PAWE L PRA LAT Abstract. In this aer we study the set chromatic number of a random grah G(n, ) for a wide range of = (n). We

More information

POINTS ON CONICS MODULO p

POINTS ON CONICS MODULO p POINTS ON CONICS MODULO TEAM 2: JONGMIN BAEK, ANAND DEOPURKAR, AND KATHERINE REDFIELD Abstract. We comute the number of integer oints on conics modulo, where is an odd rime. We extend our results to conics

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins

More information

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables Imroved Bounds on Bell Numbers and on Moments of Sums of Random Variables Daniel Berend Tamir Tassa Abstract We rovide bounds for moments of sums of sequences of indeendent random variables. Concentrating

More information

An Estimate For Heilbronn s Exponential Sum

An Estimate For Heilbronn s Exponential Sum An Estimate For Heilbronn s Exonential Sum D.R. Heath-Brown Magdalen College, Oxford For Heini Halberstam, on his retirement Let be a rime, and set e(x) = ex(2πix). Heilbronn s exonential sum is defined

More information

DISCRIMINANTS IN TOWERS

DISCRIMINANTS IN TOWERS DISCRIMINANTS IN TOWERS JOSEPH RABINOFF Let A be a Dedekind domain with fraction field F, let K/F be a finite searable extension field, and let B be the integral closure of A in K. In this note, we will

More information

Finding a sparse vector in a subspace: linear sparsity using alternating directions

Finding a sparse vector in a subspace: linear sparsity using alternating directions IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright

More information

MATH 361: NUMBER THEORY EIGHTH LECTURE

MATH 361: NUMBER THEORY EIGHTH LECTURE MATH 361: NUMBER THEORY EIGHTH LECTURE 1. Quadratic Recirocity: Introduction Quadratic recirocity is the first result of modern number theory. Lagrange conjectured it in the late 1700 s, but it was first

More information

HASSE INVARIANTS FOR THE CLAUSEN ELLIPTIC CURVES

HASSE INVARIANTS FOR THE CLAUSEN ELLIPTIC CURVES HASSE INVARIANTS FOR THE CLAUSEN ELLIPTIC CURVES AHMAD EL-GUINDY AND KEN ONO Astract. Gauss s F x hyergeometric function gives eriods of ellitic curves in Legendre normal form. Certain truncations of this

More information

On the Toppling of a Sand Pile

On the Toppling of a Sand Pile Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université

More information

LEIBNIZ SEMINORMS IN PROBABILITY SPACES

LEIBNIZ SEMINORMS IN PROBABILITY SPACES LEIBNIZ SEMINORMS IN PROBABILITY SPACES ÁDÁM BESENYEI AND ZOLTÁN LÉKA Abstract. In this aer we study the (strong) Leibniz roerty of centered moments of bounded random variables. We shall answer a question

More information

Lecture 3 January 16

Lecture 3 January 16 Stats 3b: Theory of Statistics Winter 28 Lecture 3 January 6 Lecturer: Yu Bai/John Duchi Scribe: Shuangning Li, Theodor Misiakiewicz Warning: these notes may contain factual errors Reading: VDV Chater

More information

DIFFERENTIAL GEOMETRY. LECTURES 9-10,

DIFFERENTIAL GEOMETRY. LECTURES 9-10, DIFFERENTIAL GEOMETRY. LECTURES 9-10, 23-26.06.08 Let us rovide some more details to the definintion of the de Rham differential. Let V, W be two vector bundles and assume we want to define an oerator

More information

HAUSDORFF MEASURE OF p-cantor SETS

HAUSDORFF MEASURE OF p-cantor SETS Real Analysis Exchange Vol. 302), 2004/2005,. 20 C. Cabrelli, U. Molter, Deartamento de Matemática, Facultad de Cs. Exactas y Naturales, Universidad de Buenos Aires and CONICET, Pabellón I - Ciudad Universitaria,

More information

A Social Welfare Optimal Sequential Allocation Procedure

A Social Welfare Optimal Sequential Allocation Procedure A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential

More information