Robust Low Rank Kernel Embeddings of Multivariate Distributions

Size: px
Start display at page:

Download "Robust Low Rank Kernel Embeddings of Multivariate Distributions"

Transcription

1 Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology Abstract Kernel embeing of istributions has le to many recent avances in machine learning. However, latent an low rank structures prevalent in real worl istributions have rarely been taken into account in this setting. Furthermore, no prior work in kernel embeing literature has aresse the issue of robust embeing when the latent an low rank information are misspecifie. In this paper, we propose a hierarchical low rank ecomposition of kernels embeings which can exploit such low rank structures in ata while being robust to moel misspecification. We also illustrate with empirical evience that the estimate low rank embeings lea to improve performance in ensity estimation. Introuction Many applications of machine learning, ranging from computer vision to computational biology, require the analysis of large volumes of high-imensional continuous-value measurements. Complex statistical features are commonplace, incluing multi-moality, skewness, an rich epenency structures. Kernel embeing of istributions is an effective framework to aress challenging problems in this regime [, 2]. Its key iea is to implicitly map istributions into potentially infinite imensional feature spaces using kernels, such that subsequent comparison an manipulation of these istributions can be achieve via feature space operations (e.g., inner prouct, istance, projection an spectral analysis). This new framework has le to many recent avances in machine learning such as kernel inepenence test [3] an kernel belief propagation [4]. However, algorithms esigne with kernel embeings have rarely taken into account latent an low rank structures prevalent in high imensional ata arising from various applications such as gene expression analysis. While these information have been extensively exploite in other learning contexts such as graphical moels an collaborative filtering, their use in kernel embeings remains scarce an challenging. Intuitively, these intrinsically low imensional structures of the ata shoul reuce the effect number of parameters in kernel embeings, an allow us to obtain a better estimator when facing with high imensional problems. As a emonstration of the above intuition, we illustrate the behavior of low rank kernel embeings (which we will explain later in more etails) when applie to ensity estimation (Figure ). 00 ata points are sample i.i.. from a mixture of 2 spherical Gaussians, where the latent variable is the cluster inicator. The fitte ensity base on an orinary kernel ensity estimator has quite ifferent contours from the groun truth (Figure (b)), while those provie by low rank embeings appear to be much closer to the groun truth ((Figure (c)). Essentially, the low rank approximation step enows kernel embeings with an aitional mechanism to smooth the estimator which can be beneficial when the number of ata points is small an there are clusters in the ata. In our later more systematic experiments, we show that low rank embeings can lea to ensity estimators which can significantly improve over alternative approaches in terms of hel-out likelihoo. While there are a hanful of exceptions [5, 6] in the kernel embeing literature which have exploite latent an low rank information, these algorithms are not robust in the sense that, when such information are misspecification, no performance guarantee can be provie an these algorithms can fail rastically. The hierarchical low rank kernel embeings we propose in this paper can be

2 (a) Groun truth (b) Orinary KDE (c) Low Rank KDE Figure : We raw 00 samples from a mixture of 2 spherical Gaussians with equal mixing weights. (a) the contour plot for the groun truth ensity, (b) for orinary kernel ensity estimator (KDE), (c) for low rank KDE. We use cross-valiation to fin the best kernel banwith for both the KDE an low rank KDE. The latter prouces a ensity which is visibly closer to the groun truth, an in term of the integrate square error, it is smaller than the KDE ( vs. 0.02). consiere as a kernel generalization of the iscrete value tree-structure latent variable moels stuie in [7]. The objective of the current paper is to aress previous limitations of kernel embeings as applie to graphical moels an make them more practically useful. Furthermore, we will provie both theoretical an empirical support to the new approach. Another key contribution of the paper is a novel view of kernel embeing of multivariate istributions as infinite imensional higher orer tensors, an the low rank structure of these tensors in the presence of latent variables. This novel view allows us to introuce moern multi-linear algebra an tensor ecomposition tools to aress challenging problems in the interface between kernel methos an latent variable moels. We believe our work will play a synergistic role in briging together largely separate areas in machine learning research, incluing kernel methos, latent variable moels, an tensor ata analysis. In the remainer of the paper, we will first present the tensor view of kernel embeings of multivariate istributions an its low rank structure in the presence of latent variables. Then we will present our algorithm for hierarchical low rank ecomposition of kernel embeings by making use of a sequence of neste kernel singular value ecompositions. Last, we will provie both theoretical an empirical support to our propose approach. 2 Kernel Embeings of Distributions We will focus on continuous omains, an enote X a ranom variable with omain Ω an ensity p(x). The instantiations of X are enote by lower case character, x. A reproucing kernel Hilbert space (RKHS) F on Ω with a kernel k(x, x ) is a Hilbert space of functions f : Ω R with inner prouct, F. Its element k(x, ) satisfies the reproucing property: f( ), k(x, ) F = f(x), an consequently, k(x, ), k(x, ) F = k(x, x ), meaning that we can view the evaluation of a function f at any point x Ω as an inner prouct. Alternatively, k(x, ) can be viewe as an implicit feature map φ(x) where k(x, x ) = φ(x), φ(x ) F. For simplicity of notation, we assumes that the omain of all variables are the same an the same kernel function is applie to all variables. A kernel embeing represents a ensity by its expecte features, i.e., µ X := E X [φ(x)] = φ(x)p(x)x, or a point in a potentially infinite-imensional an implicit feature space of a k- Ω ernel [8,, 2]. The embeing µ X has the property that the expectation of any RKHS function f F can be evaluate as an inner prouct in F, µ X, f F := E X [f(x)]. Kernel embeings can be reaily generalize to joint ensity of variables, X,..., X, using - th orer tensor prouct feature space F, In this feature space, the feature map is efine as i= φ(x i) := φ(x ) φ(x 2 )... φ(x ), an the inner prouct in this space satisfies i= φ(x i ), i= φ(x i ) = F i= φ(x i), φ(x i ) F = i= k(x i, x i ). Then we can embe a joint ensity p(x,..., X ) into a tensor prouct feature space F by [ C X: := E X: i= φ(x i ) ] ( = i= φ(x i ) ) p(x,..., x ) x i, () Ω where we use X : to enote the set of variables {X,..., X }. i= 2

3 The kernel embeings can also be generalize to conitional ensities p(x z) [9] µ X z := E X z [φ(x)] = φ(x) p(x z) x (2) Given this embeing, the conitional expectation of a function f F can be compute as E X z [f(x)] = f, µ X z. Unlike the orinary embeings, an embeing of conitional istribution is not a single element in the RKHS, but will instea sweep out a family of points in the F RKHS, each inexe by a fixe value z of the conitioning variable Z. It is only by fixing Z to a particular value z, that we will be able to obtain a single RKHS element, µ X z F. In other wors, conitional embeing is an operator, enote as C X Z, which can take as input an z an output an embeing, i.e., µ X z = C X Z φ(z). Likewise, kernel embeing of conitional istributions can also be generalize to joint istribution of variables. We will represent an observation from a iscrete variable Z taking r possible value using the stanar basis in R r (or one-of-r representation). That is when z takes the i-th value, the i-th imension of vector z is an other imensions 0. For instance, when r = 3, Z can take three possible value (, 0, 0), (0,, 0) an (0, 0, ). In this case, we let φ(z) = Z an use the linear kernel k(z, Z ) = Z Z. Then, the conitional embeing operator reuces to a separate embeing µ X z for each conitional ensity p(x z). Conceptually, we can concatenate these µ X z for ifferent value of z in columns C X Z := (µ X z=(,0,0), µ X z=(0,,0), µ X z=(0,0,) ). The operation C X Z φ(z) essentially picks up the corresponing embeing (or column). 3 Kernel Embeings as Infinite Dimensional Higher Orer Tensors The above kernel embeing C X: can also be viewe as a multi-linear operator (tensor) of orer mapping from F... F to R. (For generic introuction to tensor an tensor notation, please see [0]). The operator is linear in each argument (moe) when fixing other arguments. Furthermore, the application of the operator to a set of elements {f i F} i= can be efine using the inner prouct from the tensor prouct feature space, i.e., C X: f 2... f := [ ] C X:, i=f = E F X: φ(x i ), f i F, (3) i= where i means applying f i to the i-th argument of C X:. Furthermore, we can efine the generalize Frobenius norm of C X: as C X: 2 = i = i = (C X : e i 2... e i ) 2 using an orthonormal basis {e i } i= F. We can also efine the inner prouct for the space of such operator that C X: < as C X:, C X: = (C X: e i 2... e i )( C X: e i 2... e i ). (4) i = i = When C X: has the form of E X: [ i= φ(x i ) ], the above inner prouct reuces to E X: [ C X: φ(x ) 2... φ(x )]. In this paper, the orering of the tensor moes is not essential so we simply label them using the corresponing ranom variables. We can reshape a higher orer tensor into a lower orer tensor by partitioning its moes into several isjoint groups. For instance, let I = {X,..., X s } be the set of moes corresponing to the first s variables an I 2 = {X s+,..., X }. Similarly to the Matlab function, we can obtain a 2n orer tensor by Ω C I;I 2 = reshape (C X:, I, I 2 ) : F s F s R. (5) In the reverse irection, we can also reshape a lower orer tensor into a higher orer one by further partitioning certain moe of the tensor. For instance, we can partition I into I = {X,..., X t } an I = {X t+,..., X s }, an turn C I;I 2 into a 3r orer tensor by C I ;I = reshape (C ;I2 I ;I 2, I, I, I 2 ) : F t F s t F s R. (6) Note that given a orthonormal basis {e i } i= F, we can reaily obtain an orthonormal basis for, e.g., F t, as {e i... e it } i,...,i t=, an hence efine the generalize Frobenius norm for C I;I 2 an C I ;I ;I2. This also implies that the generalize Frobenius norms are the same for all these reshape tensors, i.e., C X: = C I;I 2 = CI ;I. ;I2 3

4 Z X X 3 Z Z 2 Z Z Z 2 X X 2 X 2 X 4 X X 2 X (a) X X 2 Z (b) X :2 X 3:4 Z :2 (c) Caterpillar tree (hien Markov moel) Figure 2: Three latent variable moel with ifferent tree topologies The 2n orer tensor C I;I 2 can also be viewe as the cross-covariance operator between two sets of variables in I an I 2. In this case, we can essentially use notation an operations for matrices. For instance, we can perform singular value ecomposition of C I;I 2 = i= s i(u i v i ) where s i R are orere in nonincreasing manner, {u i } i= F s an {v i } i= F s are singular vectors. The rank of C I;I 2 is the smallest r such that s i = 0 for i r. In this case, we will also efine U r = (u, u 2,..., u r ), V r = (v, v 2,..., v r ) an S r = iag (s, s 2,..., s r ), an enote the low rank approximation as C I;I 2 = U r S r Vr. Finally, a st orer tensor reshape (C X:, {X : }, ), is simply a vector where we we will use vector notation. 4 Low Rank Kernel Embeings Inuce by Latent Variables In the presence of latent variables, the kernel embeing C X: will be low rank. For example, the two observe variables X an X 2 in the example in Figure is conitional inepenent given the latent cluster inicator variable Z. That is the joint ensity factorizes as p(x, X 2 ) = z p(z)p(x z)p(x 2 z) (see Figure 2(a) for the graphical moel). Throughout the paper, we assume that z is iscrete an takes r possible values. Then the embeing C XX 2 of p(x, X 2 ) has a rank at most r. Let z be represente as the stanar basis in R r. Then C XX 2 = E Z [( EX Z[φ(X )]Z ) ( E X2 Z[φ(X 2 )]Z )] = C X Z E Z [Z Z] ( C X2 Z) (7) where E Z [Z Z] is an r r matrix, an hence restricting the rank of C XX 2 to be at most r. In our secon example, four observe variables are connecte via two latent variables Z an Z 2 each taking r possible values. The conitional inepenence structure implies that the ensity of p(x, X 2, X 3, X 4 ) factorizes as z,z 2 p(x z )p(x 2 z )p(z, z 2 )p(x 3 z 2 )p(x 4 z 2 ) (see Figure 2(b) for the graphical moel). Reshaping its kernel embeing C X:4, we obtain C X:2;X 3:4 = reshape (C X:4, {X :2 }, {X 3:4 }) which factorizes as E X:2 Z [φ(x ) φ(x 2 )] E ZZ 2 [Z Z 2 ] ( E X3:4 Z 2 [φ(x 3 ) φ(x 4 )] ) where E ZZ 2 [Z Z 2 ] is an r r matrix. Hence the intrinsic rank of the reshape embeing is only r, although the original kernel embeing C X:4 is a 4th orer tensor with infinite imensions. In general, for a latent variable moel p(x,..., X ) where the conitional inepenence structure is a tree T, various reshapings of its kernel embeing C X: accoring to eges in the tree will be low rank. More specifically, each ege in the latent tree correspons to a pair of latent variables (Z s, Z t ) (or an observe an a hien variable (X s, Z t )) which inuces a partition of the observe variables into two groups, I an I 2. One can imagine splitting the latent tree into two subtrees by cutting the ege. One group of variables resie in the first subtree, an the other group in the secon subtree. If we reshape the tensor accoring to this partitioning, then Theorem Assume that all observe variables are leaves in the latent tree structure, an all latent variables take r possible values, then rank(c I;I 2 ) r. Proof Due to the conitional inepenence structure inuce by the latent tree, p(x,..., X ) = z s z t p(i z s )p(z s, z t )p(i 2 z t ). Then its embeing can be written as C I;I 2 = C I Z s E ZsZ t [Z s Z t ] ( C I2 Z t ), (9) where C I Z s an C I2 Z t are the conitional embeing operators for p(i z s ) an p(i 2 z t ) respectively. Since E ZsZ t [Z s Z t ] is a r r matrix, rank(c I;I 2 ) r. Theorem implies that, given a latent tree moel, we obtain a collection of low rank reshapings {C I;I 2 } of the kernel embeing C X:, each corresponing to an ege (Z s, Z t ) of the tree. We (8) 4

5 will enote by H(T, r) the class of kernel embeings C X: whose various reshapings accoring to the latent tree T have rank at most r. We will also use C X: H(T, r) to inicator such a relation. In practice, the latent tree moel assumption may be misspecifie for a joint ensity p(x,..., X ), an consequently the various reshapings of its kernel embeing C X: are only approximately low rank. In this case, we will instea impose a (potentially misspecifie) latent structure T an a fixe rank r on the ata an obtain an approximate low rank ecomposition of the kernel embeing. The goal is to obtain a low rank embeing C X: H(T, r), while at the same time insure C X: C X: is small. In the following, we will present such a ecomposition algorithm. 5 Low Rank Decomposition of Kernel Embeings For simplicity of exposition, we will focus on the case where the latent tree structure T has a caterpillar shape (Figure 2(c)). This ecomposition can be viewe as a kernel generalization of the hierarchical tensor ecomposition in [, 2, 7]. The ecomposition procees by reshaping the kernel embeing C X: accoring to the first ege (Z, Z 2 ), resulting in A := C X;X 2:. Then we perform a rank r approximation for it, resulting in A U r S r Vr. This leas to the first intermeiate tensor G = U r, an we reshape S r Vr an recursively ecompose it. We note that Algorithm contains only pseuo coes, an not implementable in practice since the kernel embeing to ecompose can have infinite imensions. We will esign a practical kernel algorithm in the next section. Algorithm Low Rank Decomposition of Kernel Embeings In: A kernel embeing C X:, the caterpillar tree T an esire rank r Out: A low rank embeing C X: H(T, r) as intermeiate tensors {G,..., G } : A = reshape(c X:, {X }, {X 2: }) accoring to tree T. 2: A U r S r V r, approximate A using its r leaing singular vectors. 3: G = U r, an B = S r V r. G can be viewe as a moel with two variables, X an Z ; an B as a new caterpillar tree moel T with variable X remove from T. 4: for j = 2,..., o 5: A j = reshape(b j, {Z j, X j }, {X j+: }) accoring to tree T j. 6: A j U r S r V r, approximate A j using its r leaing singular vectors. 7: G j = reshape(u r, {Z j }, {X j }, {Z j }), an B j = S r V r. G j can be viewe as a moel with three variables, X j, Z j an Z j ; an B j as a new caterpillar tree moel T j with variable Z j an X j remove from T j. 8: en for 9: G = B Once we finish the ecomposition, we obtain the low rank representation of the kernel embeing as a set of intermeiate tensors {G,..., G }. In particular, we can think of G as a secon orer tensor with imension r, G as a secon orer tensor with imension r, an G j for 2 j as a thir orer tensor with imension r r. Then we can apply the low rank kernel embeing C X: to a set of elements {f i F} i= as follows C X: f 2... f = (G f ) (G 2 2 f 2 )... (G 2 f )(G 2 f ). Base on the above ecomposition, one can obtain a low rank ensity estimate by p(x,..., X ) = C X: φ(x ) 2... φ(x ). We can also compute the ifference between C X: an the operator C X: by using the generalize Frobenius norm C X: C X:. 6 Kernel Algorithm In practice, we are only provie with a finite number of samples { (x i,..., x i )} n raw i.i.. from i= p(x,..., X ), an we want to obtain an empirical low rank ecomposition of the kernel embeing. In this case, we will perform a low rank ecomposition of the empirical kernel embeing C X: = n ( n i= j= φ(x i j )). Although the empirical kernel embeing still has infinite imensions, we will show that we can carry out the ecomposition using just the kernel matrices. Let us enote the kernel matrix for each imension of the ata by K j where j {,..., }. The (i, i )-th entry in K j can be compute as Kj ii = k(x i j, xi j ). Alternatively, one can think of implicitly forming One can reaily generalize this notation to ecompositions where ifferent reshapings have ifferent ranks. 5

6 the feature matrix Φ j = ( φ(x j ),..., φ(xn j )), an the corresponing kernel matrix is K j = Φ j Φ j. Furthermore, we enote the tensor feature matrix forme from imension j + to of the ata as Ψ j = ( j =j+ φ(x j ),..., j =j+ φ(xn j )). The corresponing kernel matrix L j = Ψ j Ψ j with the (i, i )-th entry in L j efine as L ii j = j =j+ k(xi j, xi j ). Step -3 in Algorithm. The key builing block of the algorithm is a kernel singular value ecomposition (Algorithm 2), which we will explain in more etails using the example in step 2 of Algorithm. Using the implicitly efine feature matrix, A can be expresse as A = n Φ Ψ. For the low rank approximation, A U r S r Vr, using singular value ecomposition, the leaing r singular vector U r = (u,..., u r ) will lie in the span of Φ, i.e., U r = Φ (β,..., β r ) where β R n. Then we can transform the singular value ecomposition problem for an infinite imensional matrix to a generalize eigenvalue problem involving kernel matrices, A A u = λ u n Φ 2 Ψ Ψ Φ Φ β = λ Φ β n K 2 L K β = λ K β. Let the Cholesky ecomposition of K be R R, then the generalize eigenvalue ecomposition problem can be solve by reefining β = Rβ, an solving an orinary eigenvalue problem n 2 RL R β = λ β, an obtain β = R β. (0) The resulting singular vectors satisfy u l u l = β l Φ Φ β l = βl Kβ l = β l β l = δ ll. Then we can obtain B := S r Vr = Ur A by projecting the column of A using the singular vectors U r, B = n (β,..., β r ) Φ Φ Ψ = n (β,..., β r ) K Ψ =: (γ,..., γ n )Ψ () where γ R r can be treate as the reuce r-imensional feature representation for each feature mappe ata point φ(x i ). Then we have the first intermeiate tensor G = U r = Φ (β,..., β r ) =: Φ (θ,..., θ n ), where θ R r. Then the kernel singular value ecomposition can be carrie out recursively on the reshape tensor B. Step 5-7 in Algorithm. When j = 2, we first reshape B = S r Vr to obtain A 2 = Φ n 2 Ψ 2, where Φ 2 = (γ φ(x 2),..., γ n φ(x n 2 )). Then we can carry out similar singular value ecomposition as before, an obtain U r = Φ 2 (β,..., β r ) =: Φ 2 (θ,..., θ n ). Then we have the secon operator G 2 = n i= γi φ(x i 2) θ i. Last, we efine B 2 := S r Vr = Ur A 2 as B 2 = n (β,..., β r ) Φ 2 Φ2 Ψ 2 = n (β,..., β r ) (Γ K 2 )Ψ 2 =: n (γ,..., γ n )Ψ 2, (2) an carry out the recursive ecomposition further. The result of the algorithm is an empirical low rank kernel embeing, ĈX :, represente as a collection of intermeiate tensors {G,..., G }. The overall algorithm is summarize in Algorithm 3. More etails about the erivation can be foun in Appenix A. The application of the set of intermeiate tensor {G,..., G } to a set of elements {f i F} can be expresse as kernel operations. For instance, we can obtain a ensity estimate by p(x,..., x ) = Ĉ X: φ(x ) 2... φ(x ) = z,...,z g (x, z )g 2 (z, x 2, z 2 )... g (z, x ) where (see Appenix A for more etails) g (x, z ) = G φ(x ) 2 z = n i= (z θ i )k(x i, x ) (3) g j (z j, x j, z j ) = G j z j 2 φ(x j ) 3 z j = n i= (z j γ i )k(x i j, x j )(zj θ i ) (4) g (z, x ) = G z x = n i= (z γ i )k(x i, x ) (5) In the above formulas, each term is a weighte combination of kernel functions, an the weighting is etermine by the kernel singular value ecomposition an the values of the latent variable {z j }. 7 Performance Guarantees As we mentione in the introuction, the impose latent structure use in the low rank ecomposition of kernel embeings may be misspecifie, an the ecomposition of empirical embeings may suffer from sampling error. In this section, we provie finite guarantee for Algorithm 3 even when the latent structures are misspecifie. More specifically, we will boun, in terms of the gen- 6

7 Algorithm 2 KernelSVD(K, L, r) Out: A collection of vectors (θ,..., θ n ) : Perform Cholesky ecomposition K = R R 2: Solve eigen ecomposition n RLR β 2 = λ β, an keep the leaing r eigen vectors ( β,..., β r ) 3: Compute β = R β,..., β r = R β r, an reorgnaize (θ,..., θ n ) = (β,..., β r ) Algorithm 3 Kernel Low Rank Decomposition of Empirical Embeing C X: In: A sample { (x i,..., x i )} n i=, esire rank r, a query point (x,..., x ) Out: A low rank embeing ĈX : H(T, r) as intermeiate operators {G,..., G } : L = 2: for j =,,..., o 3: Compute matrix K j with Kj ii = k(x i j, xi j ); furthermore, if j <, then L j = L j+ K j+ 4: en for 5: (θ,..., θ n ) = KernelSVD(K, L, r) 6: G = Φ (θ,..., θ n ), an compute (γ,..., γ n ) = (θ,..., θ n )K 7: for j = 2,..., o 8: Γ = (γ,..., γ n ) (γ,..., γ n ), an compute (θ,..., θ n ) = KernelSVD(K i Γ, L i, r) 9: G j = n i= γi φ(x i j ) θi, an compute (γ,..., γ n ) = (θ,..., θ n )K i 0: en for : G = (γ,..., γ n )Φ eralize Frobenius norm C X: ĈX :, the ifference between the true kernel embeings an the low rank kernel embeings estimate from a set of n i.i.. samples { (x i,..., x i )} n. First i= we observe that the ifference can be ecompose into two terms C X: ĈX : C X: C X: + }{{} C X: ĈX : (6) }{{} E : moel error E 2: estimation error where the first term is ue to the fact that the latent structures may be misspecifie, while the secon term is ue to estimation from finite number of ata points. We will boun these two sources of error separately (the proof is eferre to Appenix B) Theorem 2 Suppose each reshaping C I;I 2 of C X: accoring to an ege in the latent tree structure has a rank r approximation U r S r Vr with error CI;I 2 U r S r Vr ɛ. Then the low rank ecomposition C X: from Algorithm satisfies C X: C X: ɛ. Although previous work [5, 6] have also use hierarchical ecomposition for kernel embeings, their ecompositions make the strong assumption that the latent tree moels are correctly specifie. When the moels are misspecifie, these algorithms have no guarantees whatsoever, an may fail rastically as we show in later experiments. In contrast, the ecomposition we propose here are robust in the sense that even when the latent tree structure is misspecifie, we can still provie the approximation guarantee for the algorithm. Furthermore, when the latent tree structures are correctly specifie an the rank r is also correct, then C I;I 2 has rank r an hence ɛ = 0 an our ecomposition algorithm oes not incur any moeling error. Next, we provie boun for the the estimation error. The estimation error arises from ecomposing the empirical estimate C X: of the kernel embeing, an the error can accumulate as we combine intermeiate tensors {G,..., G } to form the final low rank kernel embeing. More specifically, we have the following boun (the proof is eferre to Appenix C) Theorem 3 Suppose the r-th singular value of each reshaping C I;I 2 of C X: accoring to an ege in the latent tree structure is lower boune by λ, then with probability at least δ, C X: Ĉ X: (+λ) 2 C λ 2 X: C X: (+λ) 2 c λ 2, with some constant c associate with the n kernel an the probability δ. 7

8 From the above theorem, we can see that the smaller the r-th singular value, the more ifficult it is to estimate the low rank kernel embeing. Although in the boun the error grows exponential in /λ 2, in our experiments, we i not observe such exponential egraation of performance even in relatively high imensional atasets. 8 Experiments Besies the synthetic ataset we showe in Figure where low rank kernel embeing can lea to significant improvement in term of estimating the ensity, we also experimente with real worl atasets from UCI ata repository. We take atasets with varying imensions an number of ata points, an the attributes of the atasets are continuous-value. We whiten the ata an compare low rank kernel embeings (Low Rank) obtaine from Algorithm 3 to 3 other alternatives for continuous ensity estimation, namely, mixture of Gaussian with full covariance matrix, orinary kernel ensity estimator (KDE) an the kernel spectral algorithm for latent trees (Spectral) [6]. We use Gaussian kernel k(x, x ) = 2πs exp( x x 2 /(2s 2 )) for KDE, Spectral an our metho (Low rank). We split each ataset into 0 subsets, an use neste cross-valiation base on helout likelihoo to choose hyperparameters: the kernel parameter s for KDE, Spectral an Low rank ({2 3, 2 2, 2,, 2, 4, 8} times the meian pairwise istance), the rank parameter r for Spectral an Low rank (range from 2 to 30), an the number of components in the Gaussian mixture (range from 2 to # Sample 30 ). For both Spectral an Low rank, we use a caterpillar tree in Figure 2(c) as the structure for the latent variable moel. From Table, we can see that low rank kernel embeings provie the best or comparable hel-out negative log-likelihoo across the atasets we experimente with. In some atasets, low rank kernel embeings can lea to rastic improvement over the alternatives. For instance, in ataset sonar an yeast, the improvement is ramatic. The Spectral approach performs even worse sometimes. This makes sense, since the caterpillar tree supplie to the algorithm may be far away from the reality an Spectral is not robust to moel misspecification. Meanwhile, the Spectral algorithm also cause numerical problem in practical. In contrast, our metho Low Rank uses the same latent structure, but achieve much more robust results. Table : Negative log-likelihoo on hel-out ata (the lower the better). Metho Data Set # Sample Dim. Gaussian mixture KDE Spectral Low rank australian ± ± ± ±0. bupa ± ± ± ±0.4 german ± ± ± ± 0.26 heart ± ± ± ± 0.3 ionosphere ± ± ± ±.00 pima ± ± ± ± 0. parkinsons ± ± ± ± 0.37 sonar ± ± ± ± 2.67 wpbc ± ± ± ± 0.86 wine ± ± ± ± 0.7 yeast ± ± ± ± Discussion an Conclusion In this paper, we presente a robust kernel embeing algorithm which can make use of the low rank structure of the ata, an provie both theoretical an empirical support for it. However, there are still a number of issues which eserve further research. First, the algorithm requires a sequence of kernel singular ecompositions which can be computationally intensive for high imensional an large atasets. Developing efficient algorithms yet with theoretical guarantees will be interesting future research. Secon, the statistical analysis coul be sharpene. For the moment, the analysis oes not seem to suggest that the obtaine estimator by our algorithm is better than orinary KDE. Thir, it will be interesting empirical work to explore other applications for low rank kernel embeings, such as kernel two-sample tests, kernel inepenence tests an kernel belief propagation. 8

9 References [] A. J. Smola, A. Gretton, L. Song, an B. Schölkopf. A Hilbert space embeing for istributions. In Proceeings of the International Conference on Algorithmic Learning Theory, volume 4754, pages 3 3. Springer, [2] B. Sriperumbuur, A. Gretton, K. Fukumizu, G. Lanckriet, an B. Schölkopf. Injective Hilbert space embeings of probability measures. In Proc. Annual Conf. Computational Learning Theory, pages 22, [3] A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Schölkopf, an A. J. Smola. A kernel statistical test of inepenence. In Avances in Neural Information Processing Systems 20, pages , Cambrige, MA, MIT Press. [4] L. Song, A. Gretton, D. Bickson, Y. Low, an C. Guestrin. Kernel belief propagation. In Proc. Intl. Conference on Artificial Intelligence an Statistics, volume 0 of JMLR workshop an conference proceeings, 20. [5] L. Song, B. Boots, S. Siiqi, G. Goron, an A. J. Smola. Hilbert space embeings of hien markov moels. In International Conference on Machine Learning, 200. [6] L. Song, A. Parikh, an E.P. Xing. Kernel embeings of latent tree graphical moels. In Avances in Neural Information Processing Systems, volume 25, 20. [7] L. Song, M. Ishteva, H. Park, A. Parikh, an E. Xing. Hierarchical tensor ecomposition of latent tree graphical moels. In International Conference on Machine Learning (ICML), 203. [8] A. Berlinet an C. Thomas-Agnan. Reproucing Kernel Hilbert Spaces in Probability an Statistics. Kluwer, [9] L. Song, J. Huang, A. J. Smola, an K. Fukumizu. Hilbert space embeings of conitional istributions. In Proceeings of the International Conference on Machine Learning, [0] Tamara. G. Kola an Brett W. Baer. Tensor ecompositions an applications. SIAM Review, 5(3): , [] L. Graseyck. Hierarchical singular value ecomposition of tensors. SIAM Journal on Matrix Analysis an Applications, 3(4): , 200. [2] I Oseleets. Tensor-train ecomposition. SIAM Journal on Scientific Computing, 33(5): , 20. [3] L. Rosasco, M. Belkin, an E.D. Vito. On learning with integral operators. Journal of Machine Learning Research, : ,

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS BY BENJAMIN AVANZI, LUKE C. CASSAR AND BERNARD WONG ABSTRACT In this paper we investigate the potential of Lévy copulas as a tool for

More information

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold CHAPTER 1 : DIFFERENTIABLE MANIFOLDS 1.1 The efinition of a ifferentiable manifol Let M be a topological space. This means that we have a family Ω of open sets efine on M. These satisfy (1), M Ω (2) the

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Kernel-based Conditional Independence Test and Application in Causal Discovery

Kernel-based Conditional Independence Test and Application in Causal Discovery Kernel-base Conitional Inepenence Test an Application in Causal iscovery Kun Zhang Jonas Peters ominik Janzing Bernhar Schölkopf Max Planck Institute for Intelligent Systems Spemannstr. 38, 72076 Tübingen

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

arxiv: v1 [math-ph] 5 May 2014

arxiv: v1 [math-ph] 5 May 2014 DIFFERENTIAL-ALGEBRAIC SOLUTIONS OF THE HEAT EQUATION VICTOR M. BUCHSTABER, ELENA YU. NETAY arxiv:1405.0926v1 [math-ph] 5 May 2014 Abstract. In this work we introuce the notion of ifferential-algebraic

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Approximate Constraint Satisfaction Requires Large LP Relaxations

Approximate Constraint Satisfaction Requires Large LP Relaxations Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi

More information

Partial Differential Equations

Partial Differential Equations Chapter Partial Differential Equations. Introuction Have solve orinary ifferential equations, i.e. ones where there is one inepenent an one epenent variable. Only orinary ifferentiation is therefore involve.

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS GEORGE A HAGEDORN AND CAROLINE LASSER Abstract We investigate the iterate Kronecker prouct of a square matrix with itself an prove an invariance

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Optimal LQR Control of Structures using Linear Modal Model

Optimal LQR Control of Structures using Linear Modal Model Optimal LQR Control of Structures using Linear Moal Moel I. Halperin,2, G. Agranovich an Y. Ribakov 2 Department of Electrical an Electronics Engineering 2 Department of Civil Engineering Faculty of Engineering,

More information

Fast Resampling Weighted v-statistics

Fast Resampling Weighted v-statistics Fast Resampling Weighte v-statistics Chunxiao Zhou Mar O. Hatfiel Clinical Research Center National Institutes of Health Bethesa, MD 20892 chunxiao.zhou@nih.gov Jiseong Par Dept of Math George Mason Univ

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

A Unified Approach for Learning the Parameters of Sum-Product Networks

A Unified Approach for Learning the Parameters of Sum-Product Networks A Unifie Approach for Learning the Parameters of Sum-Prouct Networks Han Zhao Machine Learning Dept. Carnegie Mellon University han.zhao@cs.cmu.eu Pascal Poupart School of Computer Science University of

More information

Local Linear ICA for Mutual Information Estimation in Feature Selection

Local Linear ICA for Mutual Information Estimation in Feature Selection Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

Monte Carlo Methods with Reduced Error

Monte Carlo Methods with Reduced Error Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,

More information

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

d-dimensional Arrangement Revisited

d-dimensional Arrangement Revisited -Dimensional Arrangement Revisite Daniel Rotter Jens Vygen Research Institute for Discrete Mathematics University of Bonn Revise version: April 5, 013 Abstract We revisit the -imensional arrangement problem

More information

On conditional moments of high-dimensional random vectors given lower-dimensional projections

On conditional moments of high-dimensional random vectors given lower-dimensional projections Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs Mathias Fuchs, Norbert Krautenbacher A variance ecomposition an a Central Limit Theorem for empirical losses associate with resampling esigns Technical Report Number 173, 2014 Department of Statistics

More information