Robust Low Rank Kernel Embeddings of Multivariate Distributions
|
|
- Ruth Brown
- 5 years ago
- Views:
Transcription
1 Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology Abstract Kernel embeing of istributions has le to many recent avances in machine learning. However, latent an low rank structures prevalent in real worl istributions have rarely been taken into account in this setting. Furthermore, no prior work in kernel embeing literature has aresse the issue of robust embeing when the latent an low rank information are misspecifie. In this paper, we propose a hierarchical low rank ecomposition of kernels embeings which can exploit such low rank structures in ata while being robust to moel misspecification. We also illustrate with empirical evience that the estimate low rank embeings lea to improve performance in ensity estimation. Introuction Many applications of machine learning, ranging from computer vision to computational biology, require the analysis of large volumes of high-imensional continuous-value measurements. Complex statistical features are commonplace, incluing multi-moality, skewness, an rich epenency structures. Kernel embeing of istributions is an effective framework to aress challenging problems in this regime [, 2]. Its key iea is to implicitly map istributions into potentially infinite imensional feature spaces using kernels, such that subsequent comparison an manipulation of these istributions can be achieve via feature space operations (e.g., inner prouct, istance, projection an spectral analysis). This new framework has le to many recent avances in machine learning such as kernel inepenence test [3] an kernel belief propagation [4]. However, algorithms esigne with kernel embeings have rarely taken into account latent an low rank structures prevalent in high imensional ata arising from various applications such as gene expression analysis. While these information have been extensively exploite in other learning contexts such as graphical moels an collaborative filtering, their use in kernel embeings remains scarce an challenging. Intuitively, these intrinsically low imensional structures of the ata shoul reuce the effect number of parameters in kernel embeings, an allow us to obtain a better estimator when facing with high imensional problems. As a emonstration of the above intuition, we illustrate the behavior of low rank kernel embeings (which we will explain later in more etails) when applie to ensity estimation (Figure ). 00 ata points are sample i.i.. from a mixture of 2 spherical Gaussians, where the latent variable is the cluster inicator. The fitte ensity base on an orinary kernel ensity estimator has quite ifferent contours from the groun truth (Figure (b)), while those provie by low rank embeings appear to be much closer to the groun truth ((Figure (c)). Essentially, the low rank approximation step enows kernel embeings with an aitional mechanism to smooth the estimator which can be beneficial when the number of ata points is small an there are clusters in the ata. In our later more systematic experiments, we show that low rank embeings can lea to ensity estimators which can significantly improve over alternative approaches in terms of hel-out likelihoo. While there are a hanful of exceptions [5, 6] in the kernel embeing literature which have exploite latent an low rank information, these algorithms are not robust in the sense that, when such information are misspecification, no performance guarantee can be provie an these algorithms can fail rastically. The hierarchical low rank kernel embeings we propose in this paper can be
2 (a) Groun truth (b) Orinary KDE (c) Low Rank KDE Figure : We raw 00 samples from a mixture of 2 spherical Gaussians with equal mixing weights. (a) the contour plot for the groun truth ensity, (b) for orinary kernel ensity estimator (KDE), (c) for low rank KDE. We use cross-valiation to fin the best kernel banwith for both the KDE an low rank KDE. The latter prouces a ensity which is visibly closer to the groun truth, an in term of the integrate square error, it is smaller than the KDE ( vs. 0.02). consiere as a kernel generalization of the iscrete value tree-structure latent variable moels stuie in [7]. The objective of the current paper is to aress previous limitations of kernel embeings as applie to graphical moels an make them more practically useful. Furthermore, we will provie both theoretical an empirical support to the new approach. Another key contribution of the paper is a novel view of kernel embeing of multivariate istributions as infinite imensional higher orer tensors, an the low rank structure of these tensors in the presence of latent variables. This novel view allows us to introuce moern multi-linear algebra an tensor ecomposition tools to aress challenging problems in the interface between kernel methos an latent variable moels. We believe our work will play a synergistic role in briging together largely separate areas in machine learning research, incluing kernel methos, latent variable moels, an tensor ata analysis. In the remainer of the paper, we will first present the tensor view of kernel embeings of multivariate istributions an its low rank structure in the presence of latent variables. Then we will present our algorithm for hierarchical low rank ecomposition of kernel embeings by making use of a sequence of neste kernel singular value ecompositions. Last, we will provie both theoretical an empirical support to our propose approach. 2 Kernel Embeings of Distributions We will focus on continuous omains, an enote X a ranom variable with omain Ω an ensity p(x). The instantiations of X are enote by lower case character, x. A reproucing kernel Hilbert space (RKHS) F on Ω with a kernel k(x, x ) is a Hilbert space of functions f : Ω R with inner prouct, F. Its element k(x, ) satisfies the reproucing property: f( ), k(x, ) F = f(x), an consequently, k(x, ), k(x, ) F = k(x, x ), meaning that we can view the evaluation of a function f at any point x Ω as an inner prouct. Alternatively, k(x, ) can be viewe as an implicit feature map φ(x) where k(x, x ) = φ(x), φ(x ) F. For simplicity of notation, we assumes that the omain of all variables are the same an the same kernel function is applie to all variables. A kernel embeing represents a ensity by its expecte features, i.e., µ X := E X [φ(x)] = φ(x)p(x)x, or a point in a potentially infinite-imensional an implicit feature space of a k- Ω ernel [8,, 2]. The embeing µ X has the property that the expectation of any RKHS function f F can be evaluate as an inner prouct in F, µ X, f F := E X [f(x)]. Kernel embeings can be reaily generalize to joint ensity of variables, X,..., X, using - th orer tensor prouct feature space F, In this feature space, the feature map is efine as i= φ(x i) := φ(x ) φ(x 2 )... φ(x ), an the inner prouct in this space satisfies i= φ(x i ), i= φ(x i ) = F i= φ(x i), φ(x i ) F = i= k(x i, x i ). Then we can embe a joint ensity p(x,..., X ) into a tensor prouct feature space F by [ C X: := E X: i= φ(x i ) ] ( = i= φ(x i ) ) p(x,..., x ) x i, () Ω where we use X : to enote the set of variables {X,..., X }. i= 2
3 The kernel embeings can also be generalize to conitional ensities p(x z) [9] µ X z := E X z [φ(x)] = φ(x) p(x z) x (2) Given this embeing, the conitional expectation of a function f F can be compute as E X z [f(x)] = f, µ X z. Unlike the orinary embeings, an embeing of conitional istribution is not a single element in the RKHS, but will instea sweep out a family of points in the F RKHS, each inexe by a fixe value z of the conitioning variable Z. It is only by fixing Z to a particular value z, that we will be able to obtain a single RKHS element, µ X z F. In other wors, conitional embeing is an operator, enote as C X Z, which can take as input an z an output an embeing, i.e., µ X z = C X Z φ(z). Likewise, kernel embeing of conitional istributions can also be generalize to joint istribution of variables. We will represent an observation from a iscrete variable Z taking r possible value using the stanar basis in R r (or one-of-r representation). That is when z takes the i-th value, the i-th imension of vector z is an other imensions 0. For instance, when r = 3, Z can take three possible value (, 0, 0), (0,, 0) an (0, 0, ). In this case, we let φ(z) = Z an use the linear kernel k(z, Z ) = Z Z. Then, the conitional embeing operator reuces to a separate embeing µ X z for each conitional ensity p(x z). Conceptually, we can concatenate these µ X z for ifferent value of z in columns C X Z := (µ X z=(,0,0), µ X z=(0,,0), µ X z=(0,0,) ). The operation C X Z φ(z) essentially picks up the corresponing embeing (or column). 3 Kernel Embeings as Infinite Dimensional Higher Orer Tensors The above kernel embeing C X: can also be viewe as a multi-linear operator (tensor) of orer mapping from F... F to R. (For generic introuction to tensor an tensor notation, please see [0]). The operator is linear in each argument (moe) when fixing other arguments. Furthermore, the application of the operator to a set of elements {f i F} i= can be efine using the inner prouct from the tensor prouct feature space, i.e., C X: f 2... f := [ ] C X:, i=f = E F X: φ(x i ), f i F, (3) i= where i means applying f i to the i-th argument of C X:. Furthermore, we can efine the generalize Frobenius norm of C X: as C X: 2 = i = i = (C X : e i 2... e i ) 2 using an orthonormal basis {e i } i= F. We can also efine the inner prouct for the space of such operator that C X: < as C X:, C X: = (C X: e i 2... e i )( C X: e i 2... e i ). (4) i = i = When C X: has the form of E X: [ i= φ(x i ) ], the above inner prouct reuces to E X: [ C X: φ(x ) 2... φ(x )]. In this paper, the orering of the tensor moes is not essential so we simply label them using the corresponing ranom variables. We can reshape a higher orer tensor into a lower orer tensor by partitioning its moes into several isjoint groups. For instance, let I = {X,..., X s } be the set of moes corresponing to the first s variables an I 2 = {X s+,..., X }. Similarly to the Matlab function, we can obtain a 2n orer tensor by Ω C I;I 2 = reshape (C X:, I, I 2 ) : F s F s R. (5) In the reverse irection, we can also reshape a lower orer tensor into a higher orer one by further partitioning certain moe of the tensor. For instance, we can partition I into I = {X,..., X t } an I = {X t+,..., X s }, an turn C I;I 2 into a 3r orer tensor by C I ;I = reshape (C ;I2 I ;I 2, I, I, I 2 ) : F t F s t F s R. (6) Note that given a orthonormal basis {e i } i= F, we can reaily obtain an orthonormal basis for, e.g., F t, as {e i... e it } i,...,i t=, an hence efine the generalize Frobenius norm for C I;I 2 an C I ;I ;I2. This also implies that the generalize Frobenius norms are the same for all these reshape tensors, i.e., C X: = C I;I 2 = CI ;I. ;I2 3
4 Z X X 3 Z Z 2 Z Z Z 2 X X 2 X 2 X 4 X X 2 X (a) X X 2 Z (b) X :2 X 3:4 Z :2 (c) Caterpillar tree (hien Markov moel) Figure 2: Three latent variable moel with ifferent tree topologies The 2n orer tensor C I;I 2 can also be viewe as the cross-covariance operator between two sets of variables in I an I 2. In this case, we can essentially use notation an operations for matrices. For instance, we can perform singular value ecomposition of C I;I 2 = i= s i(u i v i ) where s i R are orere in nonincreasing manner, {u i } i= F s an {v i } i= F s are singular vectors. The rank of C I;I 2 is the smallest r such that s i = 0 for i r. In this case, we will also efine U r = (u, u 2,..., u r ), V r = (v, v 2,..., v r ) an S r = iag (s, s 2,..., s r ), an enote the low rank approximation as C I;I 2 = U r S r Vr. Finally, a st orer tensor reshape (C X:, {X : }, ), is simply a vector where we we will use vector notation. 4 Low Rank Kernel Embeings Inuce by Latent Variables In the presence of latent variables, the kernel embeing C X: will be low rank. For example, the two observe variables X an X 2 in the example in Figure is conitional inepenent given the latent cluster inicator variable Z. That is the joint ensity factorizes as p(x, X 2 ) = z p(z)p(x z)p(x 2 z) (see Figure 2(a) for the graphical moel). Throughout the paper, we assume that z is iscrete an takes r possible values. Then the embeing C XX 2 of p(x, X 2 ) has a rank at most r. Let z be represente as the stanar basis in R r. Then C XX 2 = E Z [( EX Z[φ(X )]Z ) ( E X2 Z[φ(X 2 )]Z )] = C X Z E Z [Z Z] ( C X2 Z) (7) where E Z [Z Z] is an r r matrix, an hence restricting the rank of C XX 2 to be at most r. In our secon example, four observe variables are connecte via two latent variables Z an Z 2 each taking r possible values. The conitional inepenence structure implies that the ensity of p(x, X 2, X 3, X 4 ) factorizes as z,z 2 p(x z )p(x 2 z )p(z, z 2 )p(x 3 z 2 )p(x 4 z 2 ) (see Figure 2(b) for the graphical moel). Reshaping its kernel embeing C X:4, we obtain C X:2;X 3:4 = reshape (C X:4, {X :2 }, {X 3:4 }) which factorizes as E X:2 Z [φ(x ) φ(x 2 )] E ZZ 2 [Z Z 2 ] ( E X3:4 Z 2 [φ(x 3 ) φ(x 4 )] ) where E ZZ 2 [Z Z 2 ] is an r r matrix. Hence the intrinsic rank of the reshape embeing is only r, although the original kernel embeing C X:4 is a 4th orer tensor with infinite imensions. In general, for a latent variable moel p(x,..., X ) where the conitional inepenence structure is a tree T, various reshapings of its kernel embeing C X: accoring to eges in the tree will be low rank. More specifically, each ege in the latent tree correspons to a pair of latent variables (Z s, Z t ) (or an observe an a hien variable (X s, Z t )) which inuces a partition of the observe variables into two groups, I an I 2. One can imagine splitting the latent tree into two subtrees by cutting the ege. One group of variables resie in the first subtree, an the other group in the secon subtree. If we reshape the tensor accoring to this partitioning, then Theorem Assume that all observe variables are leaves in the latent tree structure, an all latent variables take r possible values, then rank(c I;I 2 ) r. Proof Due to the conitional inepenence structure inuce by the latent tree, p(x,..., X ) = z s z t p(i z s )p(z s, z t )p(i 2 z t ). Then its embeing can be written as C I;I 2 = C I Z s E ZsZ t [Z s Z t ] ( C I2 Z t ), (9) where C I Z s an C I2 Z t are the conitional embeing operators for p(i z s ) an p(i 2 z t ) respectively. Since E ZsZ t [Z s Z t ] is a r r matrix, rank(c I;I 2 ) r. Theorem implies that, given a latent tree moel, we obtain a collection of low rank reshapings {C I;I 2 } of the kernel embeing C X:, each corresponing to an ege (Z s, Z t ) of the tree. We (8) 4
5 will enote by H(T, r) the class of kernel embeings C X: whose various reshapings accoring to the latent tree T have rank at most r. We will also use C X: H(T, r) to inicator such a relation. In practice, the latent tree moel assumption may be misspecifie for a joint ensity p(x,..., X ), an consequently the various reshapings of its kernel embeing C X: are only approximately low rank. In this case, we will instea impose a (potentially misspecifie) latent structure T an a fixe rank r on the ata an obtain an approximate low rank ecomposition of the kernel embeing. The goal is to obtain a low rank embeing C X: H(T, r), while at the same time insure C X: C X: is small. In the following, we will present such a ecomposition algorithm. 5 Low Rank Decomposition of Kernel Embeings For simplicity of exposition, we will focus on the case where the latent tree structure T has a caterpillar shape (Figure 2(c)). This ecomposition can be viewe as a kernel generalization of the hierarchical tensor ecomposition in [, 2, 7]. The ecomposition procees by reshaping the kernel embeing C X: accoring to the first ege (Z, Z 2 ), resulting in A := C X;X 2:. Then we perform a rank r approximation for it, resulting in A U r S r Vr. This leas to the first intermeiate tensor G = U r, an we reshape S r Vr an recursively ecompose it. We note that Algorithm contains only pseuo coes, an not implementable in practice since the kernel embeing to ecompose can have infinite imensions. We will esign a practical kernel algorithm in the next section. Algorithm Low Rank Decomposition of Kernel Embeings In: A kernel embeing C X:, the caterpillar tree T an esire rank r Out: A low rank embeing C X: H(T, r) as intermeiate tensors {G,..., G } : A = reshape(c X:, {X }, {X 2: }) accoring to tree T. 2: A U r S r V r, approximate A using its r leaing singular vectors. 3: G = U r, an B = S r V r. G can be viewe as a moel with two variables, X an Z ; an B as a new caterpillar tree moel T with variable X remove from T. 4: for j = 2,..., o 5: A j = reshape(b j, {Z j, X j }, {X j+: }) accoring to tree T j. 6: A j U r S r V r, approximate A j using its r leaing singular vectors. 7: G j = reshape(u r, {Z j }, {X j }, {Z j }), an B j = S r V r. G j can be viewe as a moel with three variables, X j, Z j an Z j ; an B j as a new caterpillar tree moel T j with variable Z j an X j remove from T j. 8: en for 9: G = B Once we finish the ecomposition, we obtain the low rank representation of the kernel embeing as a set of intermeiate tensors {G,..., G }. In particular, we can think of G as a secon orer tensor with imension r, G as a secon orer tensor with imension r, an G j for 2 j as a thir orer tensor with imension r r. Then we can apply the low rank kernel embeing C X: to a set of elements {f i F} i= as follows C X: f 2... f = (G f ) (G 2 2 f 2 )... (G 2 f )(G 2 f ). Base on the above ecomposition, one can obtain a low rank ensity estimate by p(x,..., X ) = C X: φ(x ) 2... φ(x ). We can also compute the ifference between C X: an the operator C X: by using the generalize Frobenius norm C X: C X:. 6 Kernel Algorithm In practice, we are only provie with a finite number of samples { (x i,..., x i )} n raw i.i.. from i= p(x,..., X ), an we want to obtain an empirical low rank ecomposition of the kernel embeing. In this case, we will perform a low rank ecomposition of the empirical kernel embeing C X: = n ( n i= j= φ(x i j )). Although the empirical kernel embeing still has infinite imensions, we will show that we can carry out the ecomposition using just the kernel matrices. Let us enote the kernel matrix for each imension of the ata by K j where j {,..., }. The (i, i )-th entry in K j can be compute as Kj ii = k(x i j, xi j ). Alternatively, one can think of implicitly forming One can reaily generalize this notation to ecompositions where ifferent reshapings have ifferent ranks. 5
6 the feature matrix Φ j = ( φ(x j ),..., φ(xn j )), an the corresponing kernel matrix is K j = Φ j Φ j. Furthermore, we enote the tensor feature matrix forme from imension j + to of the ata as Ψ j = ( j =j+ φ(x j ),..., j =j+ φ(xn j )). The corresponing kernel matrix L j = Ψ j Ψ j with the (i, i )-th entry in L j efine as L ii j = j =j+ k(xi j, xi j ). Step -3 in Algorithm. The key builing block of the algorithm is a kernel singular value ecomposition (Algorithm 2), which we will explain in more etails using the example in step 2 of Algorithm. Using the implicitly efine feature matrix, A can be expresse as A = n Φ Ψ. For the low rank approximation, A U r S r Vr, using singular value ecomposition, the leaing r singular vector U r = (u,..., u r ) will lie in the span of Φ, i.e., U r = Φ (β,..., β r ) where β R n. Then we can transform the singular value ecomposition problem for an infinite imensional matrix to a generalize eigenvalue problem involving kernel matrices, A A u = λ u n Φ 2 Ψ Ψ Φ Φ β = λ Φ β n K 2 L K β = λ K β. Let the Cholesky ecomposition of K be R R, then the generalize eigenvalue ecomposition problem can be solve by reefining β = Rβ, an solving an orinary eigenvalue problem n 2 RL R β = λ β, an obtain β = R β. (0) The resulting singular vectors satisfy u l u l = β l Φ Φ β l = βl Kβ l = β l β l = δ ll. Then we can obtain B := S r Vr = Ur A by projecting the column of A using the singular vectors U r, B = n (β,..., β r ) Φ Φ Ψ = n (β,..., β r ) K Ψ =: (γ,..., γ n )Ψ () where γ R r can be treate as the reuce r-imensional feature representation for each feature mappe ata point φ(x i ). Then we have the first intermeiate tensor G = U r = Φ (β,..., β r ) =: Φ (θ,..., θ n ), where θ R r. Then the kernel singular value ecomposition can be carrie out recursively on the reshape tensor B. Step 5-7 in Algorithm. When j = 2, we first reshape B = S r Vr to obtain A 2 = Φ n 2 Ψ 2, where Φ 2 = (γ φ(x 2),..., γ n φ(x n 2 )). Then we can carry out similar singular value ecomposition as before, an obtain U r = Φ 2 (β,..., β r ) =: Φ 2 (θ,..., θ n ). Then we have the secon operator G 2 = n i= γi φ(x i 2) θ i. Last, we efine B 2 := S r Vr = Ur A 2 as B 2 = n (β,..., β r ) Φ 2 Φ2 Ψ 2 = n (β,..., β r ) (Γ K 2 )Ψ 2 =: n (γ,..., γ n )Ψ 2, (2) an carry out the recursive ecomposition further. The result of the algorithm is an empirical low rank kernel embeing, ĈX :, represente as a collection of intermeiate tensors {G,..., G }. The overall algorithm is summarize in Algorithm 3. More etails about the erivation can be foun in Appenix A. The application of the set of intermeiate tensor {G,..., G } to a set of elements {f i F} can be expresse as kernel operations. For instance, we can obtain a ensity estimate by p(x,..., x ) = Ĉ X: φ(x ) 2... φ(x ) = z,...,z g (x, z )g 2 (z, x 2, z 2 )... g (z, x ) where (see Appenix A for more etails) g (x, z ) = G φ(x ) 2 z = n i= (z θ i )k(x i, x ) (3) g j (z j, x j, z j ) = G j z j 2 φ(x j ) 3 z j = n i= (z j γ i )k(x i j, x j )(zj θ i ) (4) g (z, x ) = G z x = n i= (z γ i )k(x i, x ) (5) In the above formulas, each term is a weighte combination of kernel functions, an the weighting is etermine by the kernel singular value ecomposition an the values of the latent variable {z j }. 7 Performance Guarantees As we mentione in the introuction, the impose latent structure use in the low rank ecomposition of kernel embeings may be misspecifie, an the ecomposition of empirical embeings may suffer from sampling error. In this section, we provie finite guarantee for Algorithm 3 even when the latent structures are misspecifie. More specifically, we will boun, in terms of the gen- 6
7 Algorithm 2 KernelSVD(K, L, r) Out: A collection of vectors (θ,..., θ n ) : Perform Cholesky ecomposition K = R R 2: Solve eigen ecomposition n RLR β 2 = λ β, an keep the leaing r eigen vectors ( β,..., β r ) 3: Compute β = R β,..., β r = R β r, an reorgnaize (θ,..., θ n ) = (β,..., β r ) Algorithm 3 Kernel Low Rank Decomposition of Empirical Embeing C X: In: A sample { (x i,..., x i )} n i=, esire rank r, a query point (x,..., x ) Out: A low rank embeing ĈX : H(T, r) as intermeiate operators {G,..., G } : L = 2: for j =,,..., o 3: Compute matrix K j with Kj ii = k(x i j, xi j ); furthermore, if j <, then L j = L j+ K j+ 4: en for 5: (θ,..., θ n ) = KernelSVD(K, L, r) 6: G = Φ (θ,..., θ n ), an compute (γ,..., γ n ) = (θ,..., θ n )K 7: for j = 2,..., o 8: Γ = (γ,..., γ n ) (γ,..., γ n ), an compute (θ,..., θ n ) = KernelSVD(K i Γ, L i, r) 9: G j = n i= γi φ(x i j ) θi, an compute (γ,..., γ n ) = (θ,..., θ n )K i 0: en for : G = (γ,..., γ n )Φ eralize Frobenius norm C X: ĈX :, the ifference between the true kernel embeings an the low rank kernel embeings estimate from a set of n i.i.. samples { (x i,..., x i )} n. First i= we observe that the ifference can be ecompose into two terms C X: ĈX : C X: C X: + }{{} C X: ĈX : (6) }{{} E : moel error E 2: estimation error where the first term is ue to the fact that the latent structures may be misspecifie, while the secon term is ue to estimation from finite number of ata points. We will boun these two sources of error separately (the proof is eferre to Appenix B) Theorem 2 Suppose each reshaping C I;I 2 of C X: accoring to an ege in the latent tree structure has a rank r approximation U r S r Vr with error CI;I 2 U r S r Vr ɛ. Then the low rank ecomposition C X: from Algorithm satisfies C X: C X: ɛ. Although previous work [5, 6] have also use hierarchical ecomposition for kernel embeings, their ecompositions make the strong assumption that the latent tree moels are correctly specifie. When the moels are misspecifie, these algorithms have no guarantees whatsoever, an may fail rastically as we show in later experiments. In contrast, the ecomposition we propose here are robust in the sense that even when the latent tree structure is misspecifie, we can still provie the approximation guarantee for the algorithm. Furthermore, when the latent tree structures are correctly specifie an the rank r is also correct, then C I;I 2 has rank r an hence ɛ = 0 an our ecomposition algorithm oes not incur any moeling error. Next, we provie boun for the the estimation error. The estimation error arises from ecomposing the empirical estimate C X: of the kernel embeing, an the error can accumulate as we combine intermeiate tensors {G,..., G } to form the final low rank kernel embeing. More specifically, we have the following boun (the proof is eferre to Appenix C) Theorem 3 Suppose the r-th singular value of each reshaping C I;I 2 of C X: accoring to an ege in the latent tree structure is lower boune by λ, then with probability at least δ, C X: Ĉ X: (+λ) 2 C λ 2 X: C X: (+λ) 2 c λ 2, with some constant c associate with the n kernel an the probability δ. 7
8 From the above theorem, we can see that the smaller the r-th singular value, the more ifficult it is to estimate the low rank kernel embeing. Although in the boun the error grows exponential in /λ 2, in our experiments, we i not observe such exponential egraation of performance even in relatively high imensional atasets. 8 Experiments Besies the synthetic ataset we showe in Figure where low rank kernel embeing can lea to significant improvement in term of estimating the ensity, we also experimente with real worl atasets from UCI ata repository. We take atasets with varying imensions an number of ata points, an the attributes of the atasets are continuous-value. We whiten the ata an compare low rank kernel embeings (Low Rank) obtaine from Algorithm 3 to 3 other alternatives for continuous ensity estimation, namely, mixture of Gaussian with full covariance matrix, orinary kernel ensity estimator (KDE) an the kernel spectral algorithm for latent trees (Spectral) [6]. We use Gaussian kernel k(x, x ) = 2πs exp( x x 2 /(2s 2 )) for KDE, Spectral an our metho (Low rank). We split each ataset into 0 subsets, an use neste cross-valiation base on helout likelihoo to choose hyperparameters: the kernel parameter s for KDE, Spectral an Low rank ({2 3, 2 2, 2,, 2, 4, 8} times the meian pairwise istance), the rank parameter r for Spectral an Low rank (range from 2 to 30), an the number of components in the Gaussian mixture (range from 2 to # Sample 30 ). For both Spectral an Low rank, we use a caterpillar tree in Figure 2(c) as the structure for the latent variable moel. From Table, we can see that low rank kernel embeings provie the best or comparable hel-out negative log-likelihoo across the atasets we experimente with. In some atasets, low rank kernel embeings can lea to rastic improvement over the alternatives. For instance, in ataset sonar an yeast, the improvement is ramatic. The Spectral approach performs even worse sometimes. This makes sense, since the caterpillar tree supplie to the algorithm may be far away from the reality an Spectral is not robust to moel misspecification. Meanwhile, the Spectral algorithm also cause numerical problem in practical. In contrast, our metho Low Rank uses the same latent structure, but achieve much more robust results. Table : Negative log-likelihoo on hel-out ata (the lower the better). Metho Data Set # Sample Dim. Gaussian mixture KDE Spectral Low rank australian ± ± ± ±0. bupa ± ± ± ±0.4 german ± ± ± ± 0.26 heart ± ± ± ± 0.3 ionosphere ± ± ± ±.00 pima ± ± ± ± 0. parkinsons ± ± ± ± 0.37 sonar ± ± ± ± 2.67 wpbc ± ± ± ± 0.86 wine ± ± ± ± 0.7 yeast ± ± ± ± Discussion an Conclusion In this paper, we presente a robust kernel embeing algorithm which can make use of the low rank structure of the ata, an provie both theoretical an empirical support for it. However, there are still a number of issues which eserve further research. First, the algorithm requires a sequence of kernel singular ecompositions which can be computationally intensive for high imensional an large atasets. Developing efficient algorithms yet with theoretical guarantees will be interesting future research. Secon, the statistical analysis coul be sharpene. For the moment, the analysis oes not seem to suggest that the obtaine estimator by our algorithm is better than orinary KDE. Thir, it will be interesting empirical work to explore other applications for low rank kernel embeings, such as kernel two-sample tests, kernel inepenence tests an kernel belief propagation. 8
9 References [] A. J. Smola, A. Gretton, L. Song, an B. Schölkopf. A Hilbert space embeing for istributions. In Proceeings of the International Conference on Algorithmic Learning Theory, volume 4754, pages 3 3. Springer, [2] B. Sriperumbuur, A. Gretton, K. Fukumizu, G. Lanckriet, an B. Schölkopf. Injective Hilbert space embeings of probability measures. In Proc. Annual Conf. Computational Learning Theory, pages 22, [3] A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Schölkopf, an A. J. Smola. A kernel statistical test of inepenence. In Avances in Neural Information Processing Systems 20, pages , Cambrige, MA, MIT Press. [4] L. Song, A. Gretton, D. Bickson, Y. Low, an C. Guestrin. Kernel belief propagation. In Proc. Intl. Conference on Artificial Intelligence an Statistics, volume 0 of JMLR workshop an conference proceeings, 20. [5] L. Song, B. Boots, S. Siiqi, G. Goron, an A. J. Smola. Hilbert space embeings of hien markov moels. In International Conference on Machine Learning, 200. [6] L. Song, A. Parikh, an E.P. Xing. Kernel embeings of latent tree graphical moels. In Avances in Neural Information Processing Systems, volume 25, 20. [7] L. Song, M. Ishteva, H. Park, A. Parikh, an E. Xing. Hierarchical tensor ecomposition of latent tree graphical moels. In International Conference on Machine Learning (ICML), 203. [8] A. Berlinet an C. Thomas-Agnan. Reproucing Kernel Hilbert Spaces in Probability an Statistics. Kluwer, [9] L. Song, J. Huang, A. J. Smola, an K. Fukumizu. Hilbert space embeings of conitional istributions. In Proceeings of the International Conference on Machine Learning, [0] Tamara. G. Kola an Brett W. Baer. Tensor ecompositions an applications. SIAM Review, 5(3): , [] L. Graseyck. Hierarchical singular value ecomposition of tensors. SIAM Journal on Matrix Analysis an Applications, 3(4): , 200. [2] I Oseleets. Tensor-train ecomposition. SIAM Journal on Scientific Computing, 33(5): , 20. [3] L. Rosasco, M. Belkin, an E.D. Vito. On learning with integral operators. Journal of Machine Learning Research, : ,
Robust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeddings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.edu, bodai@gatech.edu Abstract Kernel embedding of
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationEIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION
EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationHyperbolic Moment Equations Using Quadrature-Based Projection Methods
Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationGeneralizing Kronecker Graphs in order to Model Searchable Networks
Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationQuantum Mechanics in Three Dimensions
Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.
More informationEstimating Causal Direction and Confounding Of Two Discrete Variables
Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationHybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion
Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationarxiv: v1 [cs.lg] 22 Mar 2014
CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com
More informationMODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS
MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS BY BENJAMIN AVANZI, LUKE C. CASSAR AND BERNARD WONG ABSTRACT In this paper we investigate the potential of Lévy copulas as a tool for
More informationInter-domain Gaussian Processes for Sparse Inference using Inducing Features
Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es
More informationA. Exclusive KL View of the MLE
A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationCHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold
CHAPTER 1 : DIFFERENTIABLE MANIFOLDS 1.1 The efinition of a ifferentiable manifol Let M be a topological space. This means that we have a family Ω of open sets efine on M. These satisfy (1), M Ω (2) the
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationKernel-based Conditional Independence Test and Application in Causal Discovery
Kernel-base Conitional Inepenence Test an Application in Causal iscovery Kun Zhang Jonas Peters ominik Janzing Bernhar Schölkopf Max Planck Institute for Intelligent Systems Spemannstr. 38, 72076 Tübingen
More informationModelling and simulation of dependence structures in nonlife insurance with Bernstein copulas
Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationLeaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes
Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationPermanent vs. Determinant
Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationarxiv: v1 [math-ph] 5 May 2014
DIFFERENTIAL-ALGEBRAIC SOLUTIONS OF THE HEAT EQUATION VICTOR M. BUCHSTABER, ELENA YU. NETAY arxiv:1405.0926v1 [math-ph] 5 May 2014 Abstract. In this work we introuce the notion of ifferential-algebraic
More informationDiscrete Mathematics
Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner
More informationSTATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING
STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski
More informationLecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations
Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More informationSparse Reconstruction of Systems of Ordinary Differential Equations
Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationApproximate Constraint Satisfaction Requires Large LP Relaxations
Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi
More informationPartial Differential Equations
Chapter Partial Differential Equations. Introuction Have solve orinary ifferential equations, i.e. ones where there is one inepenent an one epenent variable. Only orinary ifferentiation is therefore involve.
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationIntroduction to Markov Processes
Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav
More informationSYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS
SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS GEORGE A HAGEDORN AND CAROLINE LASSER Abstract We investigate the iterate Kronecker prouct of a square matrix with itself an prove an invariance
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationOptimal LQR Control of Structures using Linear Modal Model
Optimal LQR Control of Structures using Linear Moal Moel I. Halperin,2, G. Agranovich an Y. Ribakov 2 Department of Electrical an Electronics Engineering 2 Department of Civil Engineering Faculty of Engineering,
More informationFast Resampling Weighted v-statistics
Fast Resampling Weighte v-statistics Chunxiao Zhou Mar O. Hatfiel Clinical Research Center National Institutes of Health Bethesa, MD 20892 chunxiao.zhou@nih.gov Jiseong Par Dept of Math George Mason Univ
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationA Modification of the Jarque-Bera Test. for Normality
Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam
More informationNOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,
NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationA Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation
A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationA Unified Approach for Learning the Parameters of Sum-Product Networks
A Unifie Approach for Learning the Parameters of Sum-Prouct Networks Han Zhao Machine Learning Dept. Carnegie Mellon University han.zhao@cs.cmu.eu Pascal Poupart School of Computer Science University of
More informationLocal Linear ICA for Mutual Information Estimation in Feature Selection
Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationEVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES
MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION
More informationMonte Carlo Methods with Reduced Error
Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationConcentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification
Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,
More informationExamining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing
Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationSturm-Liouville Theory
LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationd-dimensional Arrangement Revisited
-Dimensional Arrangement Revisite Daniel Rotter Jens Vygen Research Institute for Discrete Mathematics University of Bonn Revise version: April 5, 013 Abstract We revisit the -imensional arrangement problem
More informationOn conditional moments of high-dimensional random vectors given lower-dimensional projections
Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of
More informationJointly continuous distributions and the multivariate Normal
Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationA variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs
Mathias Fuchs, Norbert Krautenbacher A variance ecomposition an a Central Limit Theorem for empirical losses associate with resampling esigns Technical Report Number 173, 2014 Department of Statistics
More information