Scale Invariant Conditional Dependence Measures

Size: px
Start display at page:

Download "Scale Invariant Conditional Dependence Measures"

Transcription

1 Sashank J. Reddi Machine Learning Departent, School of Coputer Science, Carnegie Mellon University Barnabás Póczos Machine Learning Departent, School of Coputer Science, Carnegie Mellon University Abstract In this paper we develop new dependence and conditional dependence easures and provide their estiators. An attractive property of these easures and estiators is that they are invariant to any onotone increasing transforations of the rando variables, which is iportant in any applications including feature selection. Under certain conditions we show the consistency of these estiators, derive upper bounds on their convergence rates, and show that the estiators do not suffer fro the curse of diensionality. However, when the conditions are less restrictive, we derive a lower bound which proves that in the worst case the convergence can be arbitrarily slow siilarly to soe other estiators. Nuerical illustrations deonstrate the applicability of our ethod. 1. Introduction Measuring dependencies and conditional dependencies are of great iportance in any scientific fields including achine learning, and statistics. There are nuerous probles where we want to know how large the dependence is between rando variables, and how this dependence changes if we observe other rando variables. Correlated rando variables ight becoe independent when we observe a third rando variable, and the opposite situation is also possible where independent variables becoe dependent after observing other rando variables. Thanks to the wide potential application range (e.g., in bioinforatics, pharacoinforatics, epideiol- Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, JMLR: W&CP volue 28. Copyright 2013 by the author(s). ogy, psychology, econoetrics), finding efficient dependence and conditional dependence easures has been an active research area for decades. These easures have been used, for exaple, in causality detection, feature selection, active learning, structure learning, boosting, iage registration, independent coponent and subspace analysis. Although the theory of these estiators is actively researched, there are still several fundaental open questions. The estiation of certain dependence and conditional dependence easures is easy in a few cases: For exaple, (i) when the rando variables have discrete distributions with finitely any possible values, (ii) when there is a known siple relationship between the (e.g., a linear odel describes their behavior), or (iii) if they have joint distributions that belong to a paraetric faily that is easy to estiate (e.g. noral distributions). In this paper we consider the ore challenging nonparaetric estiation proble when the rando variables have continuous distributions, and we do not have any other inforation about the. Nuerical experients indicate, that a recently proposed kernel easure based on noralized crosscovariance operators (D HS ) appears to be very powerful in easuring both dependencies and conditional dependencies (Fukuizu et al., 2008). Nonetheless, even this ethod has a few drawbacks and several open fundaental questions. In particular, lower and upper bounds on the convergence rates of the estiators are not known. It is also not clear if the D HS easure or its existing estiator D HS are invariant to any invertible transforations of the rando variables. This kind of invariance property is so iportant that Schweizer & Wolff (1981) evenputitinto the axios of dependence. One reason for this is that in any scenarios we need to copare the estiated dependencies. If certain variables are easured on different scales, the dependence can be uch different in absence of this invariance property. As a result, it

2 ight happen that in a dependence based feature selection algorith different features would be selected if we easured a quantity e.g. in gras, kilogras, or if we used log-scale. This is an odd situation that can be avoided with dependence easures that are invariant to invertible transforations of the variables. Main contributions: The goal of this paper is to provide new theoretical insights in this field. Our contributions can be suarized in the following points: (i) We prove that the dependence and conditional dependence easures (D HS ) are invariant to any invertible transforations, but its estiator D HS does not have this property. (ii) Under soe conditions we derive an upper bound on the rate of convergence of D HS. (iii) We show that if we apply D HS on the epirical copula transfored points, then the resulting estiator D C will be invariant to any onotone increasing transforations. (iv) We show that under soe conditions the estiator is consistent and derive an upper bound on the rate. (v) We prove that if the conditions are less restrictive, then convergence of both D HS and D C can be arbitrarily slow. (vi) We also generalize these dependence easures as well as their estiators to sets of rando variables and provide an upper bound on the convergence rate of the estiators. Related work: Since the literature on dependence easure is huge, we only ention few proinent exaples here. The ost well-known dependence easure is probably the Shannon utual inforation, which has been generalized to the Rényi-α (Rényi, 1961) and Tsallis-α utual inforation (Tsallis, 1988). Other interesting dependence easures are the axial correlation coefficient (Rényi, 1959), kernel utual inforation (Gretton et al., 2003), the generalized variance and kernel canonical correlation analysis (Bach, 2002), the Hilbert-Schidt independence criterion (Gretton et al., 2005), the Schweizer-Wolff easure (Schweizer & Wolff, 1981), axiu-ean discrepancy (MMD) (Borgwardt et al., 2006; Fortet & Mourier, 1953), Copula-MMD (Póczos et al., 2012), and the distance based correlation (Székely et al., 2007). Soe of these easures, e.g. the Shannon utual inforation, are invariant to any invertible transforations of the rando variables. The Copula-MMD and Schweizer- Wolff easures are invariant to onotone increasing transforations, while the MMD and any other dependence easures are not invariant to any of these transforations. The conditional dependence estiation is an even ore challenging proble, and only very few dependence easures have been generalized to the conditional case (Fukuizu et al., 2008; Poczos &Schneider, 2012). Notation: We use X P to denote that the rando variable X has probability distribution P. The sybol X Y Z indicates the conditional independence of X and Y given Z. Let X i:j denote the tuple (X i,...,x j ). E[X] stands for the expectation of rando variable X. The sybols µ X and B X denote easure and Borel σ field on X respectively. We use D(X i,...,x j )to denote the dependence easure of set of rando variables {X i,...,x j }. With slight abuse of notation, we will use D(X, Y ) to denote the dependence easure between sets of the rando variables X and Y. {A j } denotes the set {A 1,...,A k } where k will be clear fro the context. The null space and range of an operator L are denoted by N (L) and R(L) respectively, and A stands for the closure of set A. 2. Dependence Measure using Hilbert-Schidt Nor In this section, we review the theory behind the Hilbert Schidt (HS) nor and its use in defining dependence easures. Suppose (X, Y ) is a rando variable on X Y. Let H X = {f : X R} be a reproducing kernel Hilbert Space (RKHS) associated with X, feature ap φ(x) H X (x X) and kernel k X (x, y) =φ(x),φ(y) HX (x, y X). The kernel satisfies the property f(x) =f,k X (x,.) HX for f H X, which is called the reproducing property of the kernel. We can siilarly define RKHS H Y and kernel k Y associated with Y. Let us define a class of kernels known as universal kernels, which are critical to this paper. Definition 1. A kernel k X : X X R is called universal whenever the associated RKHS H X is dense in C(X ) the space of bounded continuous functions over X with respect to the L nor. Gaussian and Laplace kernels are two popular kernels which belong to the class of universal kernels. The cross-covariance operators on these RKHSs capture the dependence of rando variables X and Y (Baker, 1973). In order to ensure existence of these operators, we assue that E [k X (X, X)] < and E [k Y (Y,Y )] <. We also assue that all the kernels defined in this paper satisfy the above assuption. The cross-covariance operator (COCO) Σ : H X H Y is an operator such that g, Σ f HY = E XY [f(x)g(y )] E X [f(x)] E Y [g(y )] holds for all f H X and g H Y. Existence and uniqueness of such an operator can be shown using the Riesz representation theore (Reed & Sion, 1980).

3 If X and Y are identical, then the operator Σ XX is called the covariance operator. Both operators above are natural generalizations of the covariance atrix in Euclidean space to Hilbert space. Analogous to correlation atrix in the Euclidean space, we can define operator V, V =Σ 1/2 YY Σ Σ 1/2 XX, where R(V ) R(Σ YY ) and N (V ) R(Σ XX ). This operator is called the noralized cross-covariance operator (NOCCO). Siilar to Σ, the existence of NOCCO can be proved using Riesz representation theore. Intuitively, the noralized cross-covariance operator captures the dependence of rando variables X and Y discounting the influence of the arginals. We would like to point out that the notation we adopted leads to a few technical difficulties which can easily be addressed (refer (Grünewälder et al., 2012)). We also define the conditional covariance operators which will be useful to capture conditional independence. Suppose we have another rando variable Z on Z with RHKS H Z and kernel k Z. The noralized conditional cross-covariance operator is defined as: V Z = V V YZ V ZX. Siilar to the cross-covariance operator, the conditional cross-covariance operator is a natural extension of conditional covariance atrix to Hilbert space. An interesting aspect of the noralized conditional crosscovariance operator is that it can be expressed in ters of siple products of noralized cross-covariance operators (Fukuizu et al., 2008). It is not surprising that covariance operators described above can be used for easuring dependence between rando variables since they capture the dependence between the. While one can use Σ or V in defining the dependence easure (Gretton et al., 2005; Fukuizu et al., 2008), we will use the latter in this paper. Let the variable X denote (X, Z), which will be useful for defining conditional dependence easures. We define the HS nor of a linear operator L : H X H Y as follows. Definition 2. (Hilbert-Schidt Nor) The Hilbert- Schidt nor of L is defined as L 2 HS = i,j v j,lu i 2 H Y, where {u i } and {v j } are an orthonoral bases of H X and H Y respectively, provided the su converges. The HS nor is a generalization of the Frobenius nor on atrices and is independent of the choice of the orthonoral bases. An operator is called Hilbert- Schidt if its HS nor is finite. The covariance operators defined in this paper are assued to be Hilbert- Schidt Operators. Fukuizu et al. (2008) definethe following dependence easures: D HS (X, Y )=V 2 HS, D HS (X, Y Z) =V Z 2 HS. Note that the easures above are defined for a pair of rando variables X and Y. In Section 5 we will generalize this approach and provide dependence and conditional easures with estiators that operate on sets of rando variables. Theore 10 (in the suppleentary aterial) justifies the use of the above dependence easures. The result can be equivalently stated in ters of HS nor of the covariance operators since HS nor of an operator is zero if and only if the operator itself is a null operator. At first it ight appear that these easures are strongly linked to the kernel used in constructing the easure but Fukuizu et al. (2008) show a rearkable property that the easures are independent of the kernels, which is captured by the following result. Let E Z [P X Z P Y Z (A B)] = E[ B(Y ) Z = z]e[ A(X) Z = z]dp Z (z) for A B X and B B Y. Theore 1. (Kernel-Free property) Assue that the probabilities P XY and E Z [P X Z P Y Z ] are absolutely continuous with respect to µ X µ Y with probability density functions p XY and p X Y Z, respectively, then we have D HS (X, Y Z) = p XY (x, y) p X (x)p Y (y) p 2 X Y Z(x, y) p X (x)p Y (y)dµ X dµ Y. p X (x)p Y (y) X Y Suppose Z =, we have D HS (X, Y )= 2 pxy (x, y) p X (x)p Y (y) 1 p X (x)p Y (y)dµ X dµ Y. X Y The above result shows that these easures bear an uncanny reseblance to utual inforation and ight possibly inherit soe of its desirable properties. We show that this intuition is, in fact, true by proving an iportant consequence of the above result. In particular, the following result holds when the assuptions stated in Theore 1 are satisfied. Theore 2. (Invariance of dependence easure) Assue that the probabilities P XY and E Z [P X Z P Y Z ] are absolutely continuous. Let X R d, and let

4 Γ X : X R d be a differentiable injective function. Let Γ Y and Γ Z be defined siilarly. Under the above assuptions, we have D HS (X, Y Z) =D HS (Γ X (X), Γ Y (Y ) Γ Z (Z)). As a special case of Z =, we have D HS (X, Y )=D HS (Γ X (X), Γ Y (Y )). The proof is in the suppleentary aterial. Though we proved the result for Euclidean spaces, it can be generalized to certain easurable spaces under ild conditions (Hewitt & Stroberg, 1975). We now look at the epirical estiators for the dependence easures defined above. Let X 1:,Y 1: and Z 1: be i.i.d saples fro the joint distribution. Let µ () 1 X = i=1 k X (,X i ) and µ () Y = 1 i=1 k Y (,Y i )denote the epirical ean aps respectively. The epirical estiator of Σ is Σ () = 1 k Y (,Y i ) µ () Y i=1 k X (,X i ) µ () X, H X The epirical covariance operators Σ () XX and Σ () YY can be defined in a siilar fashion. The epirical noralized cross-covariance operator V is V () 1/2 1/2 = Σ () YY + I Σ () Σ () XX I +, where > 0 is the regularization constant (Bach, 2002; Fukuizu et al., 2004). In the later sections, we look at a particular choice of which provides good convergence rates. The epirical conditional crosscovariance operator is V () Z = V () V () YZ V () ZX. Let G X be the centered gra atrix such that G X,ij = k X (,X i ) µ X,k X (,X j ) µ X HX and R X = G X (G X + I) 1. Siilarly, we can define G Y,G Z and R Y,R Z for rando variables Y and Z. The epirical dependence easures are then D HS (X, Y )= V () 2 HS = Tr[R Y R X ], D HS (X, Y Z) = V () Z 2 HS = Tr[R Y R X 2R Y R X R Z + R Y R Z R X R Z ]. Fukuizu et al. (2008) show the consistency of the above estiators. We now provide an upper bound on the convergence rates of these estiators under certain assuptions. Theore 3. (Consistency of operators) Assue Σ 3/4 YY Σ Σ 3/4 XX is Hilbert-Schidt. Suppose satisfies 0 and 3. Then, we have convergence in probability in HS nor i.e V () V HS P 0. An upper bound on convergence rate of V () V HS is O p ( 3/2 1/2 + 1/4 ). Proof Sketch (refer suppleentary for coplete proof). We can upper bound V () V HS by V () (Σ YY + I) 1/2 Σ (Σ XX + I) 1/2 HS + (Σ YY + I) 1/2 Σ (Σ XX + I) 1/2 HS V. The first ter can be shown to be O p 3/2 1/2 using Lea 3 (in the suppleentary aterial). To prove the second part, consider the coplete orthogonal systes {ξ i } i=1 and {ψ i} i=1 for H X and H Y with an eigenvalue λ i 0 and γ i 0 respectively. Using the definition of Hilbert-Schidt nor, we can bound the square of second ter by i,j=1 λ i + γ j + 2 λ i γ j (λ i + )(γ j + ) ψ j, Σ ξ i 2. Using AM-GM inequality, and assuing λ 1, γ 1 and that Σ 3/4 YY Σ Σ 3/4 XX is Hilbert- Schidt, the theore follows. Theore 4. (Consistency of estiators) Assue Σ 3/4 YY Σ Σ 3/4 X X, Σ 3/4 ZZ Σ ZX Σ 3/4 X X and Σ 3/4 YZ Σ YZΣ 3/4 ZZ are Hilbert-Schidt. Suppose satisfies 0 and 3. Then, we have D HS (X, Y ) P D HS (X, Y ), D HS (X, Y Z) P D HS (X, Y Z). An upper bound on convergence rate of the estiator D HS (X, Y Z) is O p ( 3/2 1/2 + 1/4 ). The proof is in the suppleentary aterial. The assuptions used in Theores 3 and 4 depend on the probability distribution and the kernel. A natural question arises if such assuptions are necessary for estiating these easures. The following result answers this question affiratively. The crux of the result lies in the fact that these dependence easures are intricately connected to functionals of probability

5 distributions that are typically hard to estiate. In particular, we have p 2 D HS (X, Y )= XY (x, y) p X (x)p Y (y) 1 dµ X dµ Y. X Y Since estiation of these functionals is tightly coupled with soothness assuptions on probability distributions of the rando variables, it is reasonable to assue that our assuptions have a siilar effect. We prove the result for the special case where X, Y R d and Z =. Let P denote the set of all distributions in [0, 1] d. Theore 5. Let D n denote an estiator of D HS on saple size n. For any sequence of estiates { D n } and any sequence {a n } converging to 0, there exists a copact subset P 0 P for which the unifor rate of convergence is slower than {a n }. In other words, li inf n sup P P 0 where D = D HS (X, Y ). D n D a n > 0. The proof is in the suppleentary aterial. An iportant point to note is that while the dependence easures theselves are invariant to invertible transforations, the estiators do not possess this property. It is often desirable to have this invariance property for the estiators as well since we generally deal with finite saple estiators. This property can be achieved using copula transforation, which is our focus in the next section. Along with the theoretical analysis, we also provide a copelling justification for using copula transforation fro a practical point of view in Section Copula Transforation We review iportant properties of the copula of ultivariate distributions in this section. The use of the transforation will be clear in the later sections. The copula plays an iportant role in the study of dependence aong rando variables. They not only capture the dependence between rando variables but also help us construct dependence easures which are invariant to any strictly increasing transforation of the arginal variables. Sklar s theore is central to the theory of copulas. It gives the relationship between a ultivariate rando variable and its univariate arginals. Suppose X = X 1,...,X d R d is a d-diensional ultivariate rando variable. Let us denote the arginal cuulative distribution function (cdf) of X j by F j X : R [0, 1]. In this paper, we assue that the arginals F j X are invertible. The copula is defined by Sklar s theore as follows: Theore 6. (Sklar s theore). Let H (x 1,...,x d ) = Pr X 1 x 1,...,X d x d be the ultivariate cuulative distribution function with continuous arginals F j X. Then there exists a unique copula C such that H(x 1,...,x d )=C(FX(x 1 1 ),...,FX(x d d )). (1) Conversely, if C is a copula and F j X are arginal cdfs, then H given in Equation (1) is the joint distribution with arginals F j X. Let T X = TX X 1,...,Td denote the transfored variables where T X = F X (X) = FX 1 (X1 ),...,FX d (Xd ) [0, 1] d. Here, F X is called copula transforation. The above theore gives a one-to-one correspondence between the joint distribution of X and T X. Furtherore, it provides a way to construct dependence easures over the transfored variables since we have inforation about the copula distribution. An interesting consequence of the copula transforation is that we can get invariance of dependence easures to any strictly increasing transforations of the arginal variables. 4. Hilbert-Schidt Dependence Measure using Copulas Consider rando variables X = X 1,...,X d, Y = Y 1,...,Y d and Z = Z 1,...,Z d. We focus on the proble of defining dependence easure D C (X, Y ) and conditional dependence easure D C (X, Y Z) using copula transforation. The next section generalizes the dependence easure to any set of rando variables. Note that we have assued that the rando variables X, Y and Z are all d diensional for siplicity, but our results hold for rando variables with different diensions. Let T X,T Y and T Z be the copula transfored variables of X, Y and Z respectively. With slight abuse of notation, we use k X,k Y and k Z to denote the kernels over transfored variables T X, T Y and T Z respectively. In what follows, the kernels are functions of the for [0, 1] d [0, 1] d R since they are defined over the transfored rando variables. We define the dependence aong rando variables X and Y as: D C (X, Y )=D HS (T X,T Y )=V TY T X 2 HS. The conditional dependence easure is defined as: D C (X, Y Z) =D HS (T X,T Y T Z )=V TY T X 2 HS,

6 where T X =(T X,T Z ). By Theore 10 (in suppleentary) and the fact that the arginal cdfs are invertible, it is easy to see that D C (X, Y )=0 X Y and D C (X, Y Z) = 0 X Y Z. Our goal is to estiate the dependence easures D C (X, Y ) and D C (X, Y Z) using the i.i.d saples X 1:, Y 1: and Z 1:. Suppose we have the copula transfored variables T X,T Y and T Z, then we can use the estiators D C (X, Y )= V () 2 HS, D C (X, Y Z) = V () 2 HS for dependence and conditional dependence easures respectively. However, we only have i.i.d saples X 1:, Y 1:, Z 1:, and the arginal cdfs are unknown to us. We have to get the epirical copula transfored variables through these saples by estiating the arginals distribution functions F j X, F j Y and F j Z. These distribution functions can be estiated efficiently using the rank statistics. For x R and x j R for 1 j d, let F j X (x) = 1 i :1 i, x X j, i F X x 1,...,x d = F 1 X (x 1 ),..., F X(x d d ). F X is called the epirical copula transforation of X. The saples TX1,..., T X = FX (X 1 ),..., F X (X ), called the epirical copula, are estiates of true copula transforation. We can siilarly define epirical copula transforations and epirical copula for rando variables Y and Z. It should be noted that the saples of epirical copula TX1,..., T X are not independent even though X 1: are i.i.d saples. We can now use the dependence estiators in (Fukuizu et al., 2008) usinge- pirical copula TX1,..., T X instead of the saples (X 1,...,X ). Lea 2 (in suppleentary) shows that the epirical copula is a good approxiation of i.i.d. saples (T X1,...,T X ). It is iportant to note the relationship between easures D HS and D C. The copula transforation can also be viewed as an invertible transforation and hence, by Theore 2 we have D HS = D C. Though the easures are identical, their corresponding estiators D HS and D C are different. At this point, we should also ephasize the difference between our work and Póczos et al. (2012). Although both these works use copula trick to obtain invariance, in contrast to Póczos et al. (2012), we essentially get the sae easure even after copula transforation. In other words, the copula transforation in our case does not change the dependence easure and therefore, provides an invariant finite saple estiator to D HS. Thus, we provide a copelling case to use copulas for D HS. Moreover, the invariance property extends naturally to conditional dependence easure in our case. We now focus on the consistency of the proposed estiators D C (X, Y ) and D C (X, Y Z). We assue that the kernel functions k X,k Y and k Z are bounded kernel functions and are Lipschitz continuous on [0, 1] d i.e there exists a B>0 such that k X (x 1,x) k X (x 2,x) Bx 1 x 2, for all x, x 1,x 2 [0, 1] d. The gaussian kernel is one of the popular kernels which is not only universal but also bounded and Lipschitz continuous. In what follows, we assue that conditions required for Theore 3 and 4 hold for the transfored variables as well. We now show the consistency of the dependence estiators and provide upper bounds on their rates of convergence. Theore 7. (Consistency of copula dependence estiators) Assue kernels k X,k Y and k Z are bounded and Lipschitz continuous. Suppose satisfies 0 and 3, then (i) D C (X, Y ) P D C (X, Y ). (ii) D C (X, Y Z) P D C (X, Y Z). An upper bound on convergence rate of estiators D C (X, Y ) and D C (X, Y Z) is O p ( 3/2 1/2 + 1/4 ). Proof Sketch (refer suppleentary for coplete proof). We first show V () T Y TY V TY T X HS P 0. Consider the following upper bound of V () T Y TY V TY T X HS : V () T Y TX V () HS + V () V TY T X HS. (2) Fro Theore 3, it is easy to see that the second ter converges with O p ( 3/2 1/2 + 1/4 ). The first ter can be proved to be bounded by C 3/2 Σ () T Y TX Σ () HS on siilar lines as Lea 3. We then prove that Σ () T Y TX Σ () HS converges O p ( 1/2 ) (refer Lea 1 in the suppleentary aterial), thereby proving the overall rate to be O p ( 3/2 1/2 ). The convergence of the dependence estiators follows fro the convergence of the operators.

7 The above result shows that by using copula transforation, we can ensure that the invariance property holds for finite saple estiators without loss in statistical efficiency. It should be noted that slightly better rates of convergence can be obtained by ore restrictive assuptions. It is also noteworthy that under the conditions assued in this paper, the estiators do not suffer fro the curse of diensionality, and hence, can be used in high diensions. Moreover, the estiators only use rank statistics rather than the actual values of X 1:,Y 1: and Z 1:. This provides us with robust estiators since an arbitrarily large outlier saple cannot affect the statistics badly. In addition, this also akes the estiators invariant to onotone increasing transforations. Let us call the dependence easure defined above as pairwise dependence since it easures dependence between two rando variables. Now, the question arises if this approach can be generalized to easure dependence aongst a set of rando variables rather than just two rando variables. We answer this question affiratively in the next section. 5. Generalized Dependence Measures Suppose S = {S 1,...,S n } is a set of rando variables. Siilar to our previous assuptions, we assue that rando variables {S j } are d-diensional. We would like to easure the dependence aongst the set of rando variables. Recall S i:j represents the rando variable (S i,...,s j ). Note that S i:j is a rando variable of (j i)d diensions. With slight abuse of notation, the kernel k i:j corresponding to variable S i:j is defined appropriately. We now express the generalized dependence easure as a su of pairwise dependence easures. For siplicity, we denote D C (S 1,...,S n )by D C (S). The dependence easures are defined as: n 1 D C (S) = D C (S j,s j+1:n ), (3) j=1 n 1 D C (S Z) = D C (S j,s j+1:n Z). (4) j=1 Note that the set dependence easure is a su of n 1 pairwise dependence easures. The following result justifies the use of these dependence easures. Theore 8. (Generalized dependence easure) If the product kernels k j k j+1:n for j = {1,...,n 1} are universal, we have (i) D C (S) =0 (S 1,...,S n ) are independent. (ii) D C (S Z) = 0 (S 1,...,S n ) are independent given Z. The proof is in the suppleentary aterial. We can now use the pairwise dependence estiators for estiating set dependence. The following estiators are used for easuring dependence D C (S) = D C (S Z) = n 1 j=1 n 1 j=1 D C (S j,s j+1:n ), (5) D C (S j,s j+1:n Z). (6) Let us assue that the conditions required for Theore 7 are satisfied. The following theore states the consistency of the dependence estiators proposed above, and provides an upper bound on their rates of convergence. Theore 9. (Consistency of generalized estiators) Suppose the kernels defined above are bounded and Lipschitz continuous, then (i) D C (S) P D C (S). (ii) D C (S Z) P D C (S Z). An upper bound on convergence rate of DC (S) and D C (S Z) is O p (n 3/2 1/2 + n 1/4 ). Proof. The theore follows easily fro Theore 7 and Equations (3), (4), (5) and (6). 6. Experiental Results In this section, we epirically illustrate the theoretical contributions of the paper. We copare the perforance of D HS (X, Y ) and D HS (X, Y Z) (referred to as NHS) with D C (X, Y ) and D C (X, Y Z) (referred to as CHS), that is with and without copula respectively. In the following experients, we choose gaussian kernels and choose σ by edian heuristic. We fix = 10 6 for our experients Synthetic Dataset In the first siulation, we constructed the following rando variables: X 1 U[0, 4π], X 2 = (500 V 1, 1000 tanh(v 2 ), 500 sinh(v 3 )), where V 1, V 2, V 3 U[0, 1], and Y = 1000 tanh(x 1 ). 200 saple points are generated using the distributions specified above. The task in this experient was to choose a feature between X 1 and X 2 that contains the ost inforation about Y. Note that Y is a deterinistic function of X 1 and independent of X 2. Therefore, we expect the dependence for (X 1,Y)to be high and that of (X 2,Y) to be low. Figure 1 shows

8 dependence easure (DM) coparison of NHS and CHS. It can be seen that while CHS chooses the correct feature X 1, NHS chooses the incorrect feature X 2. becoe dependent when conditioned on Z. The results in Figure 3 clearly indicate that NHS fails to detect the dependence between X and Y when conditioned on Z while CHS successfully captures this dependence. DM DM A B C D 0 A B C D Figure 1. Coluns (A) and (B) represent the dependence of (X 1,Y) and (X 2,Y) as easured by NHS while Coluns (C) and (D) represent the dependence of (X 1,Y) and (X 2,Y) as easured by CHS. Figure 3. Coluns (A) and (B) represent the dependence of (X, Y ) and (X, Y ) Z as easured by NHS while Coluns (C) and (D) represent the dependence of (X, Y ) and (X, Y ) Z as easured by CHS. The next siulation is designed to prove the significance of invariance property on finite saples. As entioned earlier, though NHS is invariant to invertible transforation, its estiators do not retain this property. Thanks to copula transforation, CHS does not suffer fro this issue. Let X U[0, 4π], Y = 50 X and Z = 10 tanh(y/100). Note that Z is an invertible transforation of Y. The dependence of (X, Z) under CHS is not reported here as CHS is invariant to any onotone increasing transforations. We now deonstrate the asyptotic nature of the invariance property of NHS on the sae data. Figure 2 clearly shows that while both ethods have alost the sae dependence easure for (X, Y ), this dependence easured by NHS is reduced significantly when Y is transfored to Z. We can clearly see that NHS requires a large saple before it exhibits the invariance property. This proble furthur aplifies as we ove to higher diensions, aking it undesirable for higher diensional tasks. DM NHS(X,Y) NHS(X,Z) CHS(X,Y) Figure 2. DM with varying nuber of saples. We deonstrate the perforance of conditional dependence easures in the following experient. Let X, V U[0, 4π], Y = 50 sin(v ) and Z = log(x + Y ). Observe that though X and Y are independent, they 6.2. Housing Dataset We evaluate the perforance of the dependence easures on the Housing dataset fro the UCI repository. The iportance of scale invariance on real-world data is deonstrated through this experient. This dataset consists of 506 saple points each of which has 12 real valued and 1 integer valued attributes. We will only consider the real value attributes for this experient and discard the integer attribute. Our ai is to predict the edian value of owner-occupied hoes based on other attributes like per capital crie, percentage of lower status of the population etc. This dataset is particularly interesting since it contains features of different nature and scale. In this experient, we would like to predict the single ost relevant feature for predicting the edian value. Features 13 and 6 achieve the least prediction errors aongst all features using linear regressors (see suppleentary aterial for ore details). While CHS predicts these two features as the ost relevant features, NHS perfors poorly by selecting features 1 and 6 as the ost relevant features. 7. Conclusion In this paper we developed new dependence and condition dependence easures and estiators which are invariant to any onotone increasing transforations of the variables. We showed that under certain conditions the convergence rates of the estiators are polynoial, but when the conditions are less restrictive, then siilarly to other existing estiators the convergence can be arbitrarily slow. We generalized these easures and estiators to sets of variables as well, and illustrated the applicability of our ethod with a few nuerical experients.

9 References Bach, Francis R. Kernel independent coponent analysis. JMLR, 3:1 48, Baker, Charles R. Joint easures and cross-covariance operators. Transactions of the Aerican Matheatical Society, 186:pp , Borgwardt, K., Gretton, A., Rasch, M., Kriegel, H., Schölkopf, B., and Sola, A. Integrating structured biological data by kernel axiu ean discrepancy. Bioinforatics, 22(14):e49 e57, Fortet, R. and Mourier, E. Convergence de laréparation epirique vers la réparation théorique. Ann. Scient. École Nor, 70: , Fukuizu, K., Gretton, A., Sun, X., and Schoelkopf, B. Kernel easures of conditional dependence. In NIPS 20, pp , Cabridge, MA, MIT Press. Fukuizu, Kenji, Bach, Francis R., and Jordan, Michael I. Diensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5:73 99, Póczos, Barnabás, Ghahraani, Zoubin, and Schneider, Jeff G. Copula-based kernel dependency easures. In ICML, Reed, M. and Sion, B. Functional Analysis. Acadeic Press, ISBN Rényi, A. On easures of dependence. Acta. Math. Acad. Sci. Hungar, 10: , Rényi, A. On easure of entropy and inforation. In 4th Berkeley Syposiu on Math., Stat., and Prob., pp , Ritov, Y. and Bickel, P. J. Achieving inforation bounds in non and seiparaetric odels. The Annals of Statistics, 18(2):pp , Schweizer, B. and Wolff, E. On nonparaetric easures of dependence for rando variables. The Annals of Statistics, 9, Székely, G. J., Rizzo, M. L., and Bakirov, N. K. Measuring and testing dependence by correlation of distances. Annals of Statistics, 35: , Tsallis, C. Possible generalization of boltzann-gibbs statistics. J. Statist. Phys., 52(1-2): , Fukuizu, Kenji, Bach, Francis R., and Gretton, Arthur. Statistical convergence of kernel cca. In Advances in Neural Inforation Processing Systes 18, Gretton, A., Herbrich, R., and Sola, A. The kernel utual inforation. In Proc. ICASSP, Gretton, A., Bousquet, O., Sola, A., and Schölkopf, B. Measuring statistical dependence with Hilbert- Schidt nors. In ALT, pp , Grünewälder, S, Lever, G, Baldassarre, L, Patterson, S, Gretton, A, and Pontil, M. Conditional ean ebeddings as regressors. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, volue 2, pp , Hewitt, E. and Stroberg, K. Real and Abstract Analysis: A Modern Treatent of the Theory of Functions of a Real Variable. Graduate Texts in Matheatics. Springer, ISBN Poczos, B. and Schneider, J. Nonparaetric estiation of conditional inforation and divergences. In International Conference on AI and Statistics (AIS- TATS), JMLR Workshop and Conference Proceedings, 2012.

Copula-based Kernel Dependency Measures

Copula-based Kernel Dependency Measures Barnabás Póczos bapoczos@cscuedu Carnegie Mellon University, School of Coputer Science, 5 Forbes ave, Pittsburgh, PA 53 USA Zoubin Ghahraani zoubin@engcaacuk University of Cabridge, Departent of Engineering,

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

arxiv: v1 [stat.ot] 7 Jul 2010

arxiv: v1 [stat.ot] 7 Jul 2010 Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 8675. arxiv:007.094v

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Tenerife, Canary Islands, Spain, Deceber 16-18, 2006 183 Qualitative Modelling of Tie Series Using Self-Organizing Maps:

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Meta-Analytic Interval Estimation for Bivariate Correlations

Meta-Analytic Interval Estimation for Bivariate Correlations Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Some Proofs: This section provides proofs of some theoretical results in section 3.

Some Proofs: This section provides proofs of some theoretical results in section 3. Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests Working Papers 2017-03 Testing the lag length of vector autoregressive odels: A power coparison between portanteau and Lagrange ultiplier tests Raja Ben Hajria National Engineering School, University of

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

arxiv: v1 [stat.ml] 27 Oct 2018

arxiv: v1 [stat.ml] 27 Oct 2018 Stein Variational Gradient Descent as Moent Matching arxiv:80.693v stat.ml] 27 Oct 208 Qiang Liu, Dilin Wang Departent of Coputer Science The University of Texas at Austin Austin, TX 7872 {lqiang, dilin}@cs.utexas.edu

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

arxiv: v1 [cs.lg] 25 Jul 2016

arxiv: v1 [cs.lg] 25 Jul 2016 A Statistical Test for Joint Distributions Equivalence Francesco Solera University of Modena, Italy francesco.solera@uniore.it Andrea Palazzi University of Modena, Italy andrea.palazzi@uniore.it arxiv:607.0770v

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.

More information

G G G G G. Spec k G. G Spec k G G. G G m G. G Spec k. Spec k

G G G G G. Spec k G. G Spec k G G. G G m G. G Spec k. Spec k 12 VICTORIA HOSKINS 3. Algebraic group actions and quotients In this section we consider group actions on algebraic varieties and also describe what type of quotients we would like to have for such group

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

Best Procedures For Sample-Free Item Analysis

Best Procedures For Sample-Free Item Analysis Best Procedures For Saple-Free Ite Analysis Benjain D. Wright University of Chicago Graha A. Douglas University of Western Australia Wright s (1969) widely used "unconditional" procedure for Rasch saple-free

More information

Lean Walsh Transform

Lean Walsh Transform Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

The Hilbert Schmidt version of the commutator theorem for zero trace matrices The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011 This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Empirical phi-divergence test-statistics for the equality of means of two populations

Empirical phi-divergence test-statistics for the equality of means of two populations Epirical phi-divergence test-statistics for the equality of eans of two populations N. Balakrishnan, N. Martín and L. Pardo 3 Departent of Matheatics and Statistics, McMaster University, Hailton, Canada

More information

Fast and Memory Optimal Low-Rank Matrix Approximation

Fast and Memory Optimal Low-Rank Matrix Approximation Fast and Meory Optial Low-Rank Matrix Approxiation Yun Se-Young, Marc Lelarge, Alexandre Proutière To cite this version: Yun Se-Young, Marc Lelarge, Alexandre Proutière. Fast and Meory Optial Low-Rank

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Lecture 21 Nov 18, 2015

Lecture 21 Nov 18, 2015 CS 388R: Randoized Algoriths Fall 05 Prof. Eric Price Lecture Nov 8, 05 Scribe: Chad Voegele, Arun Sai Overview In the last class, we defined the ters cut sparsifier and spectral sparsifier and introduced

More information