Multi-View Clustering via Canonical Correlation Analysis
|
|
- Melinda Skinner
- 6 years ago
- Views:
Transcription
1 Technical Report TTI-TR Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago
2 ABSTRACT Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms evelope in recent years aress this problem by projecting the ata into a lower-imensional subspace, e.g. via Principal Components Analysis PCA) or ranom projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means in orer for the algorithm to be be successful). Here, we show how using multiple views of the ata can relax these stringent requirements. We use Canonical Correlation Analysis CCA) to project the ata in each view to a lower-imensional subspace. Uner the assumption that conitione on the cluster label the views are uncorrelate, we show that the separation conitions require for the algorithm to be successful are rather mil significantly weaker than those of prior results in the literature). We provie results for mixture of Gaussians, mixtures of log concave istrubtions, an mixtures of prouct istrubtions.
3 1 Introuction The multi-view approach to learning is one in which we have views of the ata sometimes in a rather abstract sense) an, if we unerstan the unerlying relationship between these views, the hope is that this relationship can be use to alleviate the ifficulty of a learning problem of interest [BM98, KF07, AZ07]. In this work, we explore how having two views of the ata makes the clustering problem significantly more tractable. Much recent work has gone into unerstaning uner what conitions we can learn a mixture moel. The basic problem is as follows: we obtain ii samples from a mixture of k istributions an our task is to either: 1) infer properties of the unerlying mixture moel e.g. the mixing weights, means, etc) or 2) classify a ranom sample accoring to which istribution it was generate from. Uner no restrictions on the unerlying istribution, this problem is consiere to be har. However, in many applications, we are only intereste in clustering the ata when the component istribution are well separate. In fact, the focus of recent clustering algorithms [Das99, VW02, AM05, BV08] is on efficiently learning with as little separation as possible. Typically, these separation conitions are such that when given a ranom sample form the mixture moel, the Bayes optimal classifier is able to reliably with high probability) recover which cluster generate that point. This work assumes a rather natural multi-view assumption: the assumption is that the views are conitionally) uncorrelate, if we conition on which mixture istribution generate the views. There are many natural applications for which this assumption is applicable. For example, we can consier multi-moal views, with one view being a vieo stream an the other an auio stream here conitione on the speaker ientity an maybe the phoneme both of which coul label the generating cluster), the views may be uncorrelate. Uner this multi-view assumption, we provie a simple an efficient subspace learning metho, base on Canonical Correlation Analysis CCA). This algorithm is affine invariant an is able to learn with some of the weakest separation conitions to ate. The intuitive reason for this is that uner our multi-view assumption, we are able to approximately) fin the subspace spanne by the means of the component istributions. Furthermore, the number of samples we nee scales as O), where is the ambient imension. This shows how the multi-view framework can provie substantial improvements to the clustering problem, aing to the growing boy of results which show how the multi-view framework can alleviate the ifficulty of learning problems. 1.1 Relate Work Most of the provably efficient clustering algorithms first project the ata own to some low imensional space an then cluster the ata in this lower imensional space typically, an algorithm such as single linkage suffices here). This projection is typically achieve either ranomly or by a spectral metho. One of the first provably efficient algorithms for learning mixture moels is ue to Dasgupta [Das99], who learns a mixture of spherical Gaussians by ranomly projecting the mixture onto a low-imensional subspace. The separation requirement for the algorithm in [Das99] is σ, where is the imension an σ is the maximal irectional stanar eviation the maximum variance in any irection of one of the component istributions). Vempala an Wang [VW02] removes the epenence of the separation requirement on the imension of the ata: given a mixture of k spherical Gaussians, they project the mixture own to the k-imensional subspace of highest variance, an as a result, they can learn such mixtures with a separation of σk 1/4. [KSV05] an [AM05] exten this result to a mixture of general Gaussians; however, for their algorithm to work correctly, they σ require a separation of wmin, where again σ is the maximum irectional stanar eviation in any irection, an w min is the minimum mixing weight. [CR08] use a canonical-correlations base 1
4 algorithm to learn mixtures of axis-aligne Gaussians with a separation of about σ k, where σ is the maximum irectional stanar eviation in the subspace containing the centers of the istributions. However, their algorithm requires the coorinate-inepenence property, an requires an aitional spreaing conition, which states that the separation between any two centers shoul be sprea along many coorinates. All these algorithms are not affine invariant an one an show that a linear transformation of the ata coul cause these algorithms to fail). Finally, [BV08] provies an affine-invariant algorithm for learning mixtures of general Gaussians, so long as the mixture has a suitably low Fisher coefficient when in isotropic position. Their result is stronger than the results of [KSV05, AM05, Das99], an more general than [CR08]; however, their 1 implie separation conition involves a rather large polynomial epenence on w min. Other than these, there are also some provably efficient algorithms which o not use projections such as [DS00] an [AK05]. The two results most closely relate to ours are the work of [VW02] an [CR08]. 1. [VW02] shows that it is sufficient to fin the subspace spanne by the means of the istributions in the mixture for effective clustering. 2. [CR08] is relate to ours, because they use a projection onto the top k singular value ecomposition subspace of the canonical correlations matrix. They also provie a spreaing conition, which is relate to the requirement on the rank in our work. We borrow techniques from both of these papers. 1.2 This Work In this paper, we stuy the problem of multi-view clustering. In our setting, we have ata on a fixe set of objects from two separate sources, which we call the two views, an our goal is to use this ata to cluster more effectively than with ata from a single source. The conitions require by our algorithm are as follows. In the sequel, we assume that the mixture is in an isotropic position in each view iniviually. First, we require that conitione on the source istribution in the mixture, the two views are uncorrelate. Notice that this conition allows the istributions in the mixture within each view to be completely general, so long as they are uncorrelate across views. Secon, we require the rank of the CCA matrix across the views to be at least k, an the k-th singular value of this matrix to be at least λ min. This conition ensures that there is sufficient correlation between the views, an if this conition hols, then, we can recover the subspace containing the means of the istributions in both views. In aition, for the case of mixture of Gaussians, if in at least one view, say view 1, we have that for every pair of istributions i an j in the mixture, µ 1 i µ 1 j > Cσ k 1/4 logn/δ) for some constant C, where µ 1 i is the mean of the i-th component istribution in view one an σ is the maximum irectional stanar eviation in the subspace containing the means of the istributions in view 1, then our algorithm can also cluster correctly, which means that it can etermine which istribution each sample came from. This separation conition is consierably weaker than previous results in that σ only epens on the irectional variance in the subspace spanne by the means as oppose to irectional variance over all irections. Also, the only other affine invariant algorithm is that in [BV08] while this result oes not explicitly state results in terms of separation between the means it uses a Fisher coefficient concept), the implie separation is rather large. We stress that our improve results are really ue the multi-view conition. We also emphasize that for our algorithm to cluster successfully, it is sufficient for the istributions in the mixture to obey the separation conition in one view; so long as our rank conition hols, an the separation 2
5 conition hols in one view, our algorithm prouces a correct clustering of the input ata in that view. 2 The Setting We assume that our ata is generate by a mixture of k istributions. In particular, we assume we obtain samples x = x 1), x 2) ), where x 1) an x 2) are the two views of the ata, which live in the vector spaces V 1 of imension 1 an V 2 of imension 2, respectively. We let = Let µ j i, for i = 1,..., k an j = 1, 2 be the center of istribution i in view j, an let w i be the mixing weight for istribution i. Let w i be the probability of cluster i. For simplicity, assume that ata have mean 0. We enote the covariance matrix of the ata as: Hence, we have: Σ = E[xx ], Σ 11 = E[x 1) x 1) ) ], Σ 22 = E[x 2) x 2) ) ], Σ 12 = E[x 1) x 2) ) ] The multi-view assumption we work with is as follows: [ ] Σ11 Σ Σ = 21. 1) Σ 12 Σ 22 Assumption 1 Multi-View Conition) We assume that conitione on the source istribution s in the mixture where s = i with probability w i ), the two views are uncorrelate. More precisely, we assume that: E[x 1) x 2) ) s = i] = E[x 1) s = i]e[x 2) ) s = i] for all i [k]. This assumption implies that: To see this, observe that Σ 12 = i w i µ 1 i µ 2 i ) T. E[x 1) x 2) ) ] = i = i = i E Di [x 1) x 2) ) ] Pr[D i ] w i E Di [x 1) ] E Di [x 2) ) ] w i µ 1 i µ 2 i ) T 2) As the istributions are in isotropic position, we observe that i w iµ 1 i = i w iµ 2 i = 0. Therefore, the above equation shows that the rank of Σ 12 is at most k 1. We now assume that it has rank precisely k 1. Assumption 2 Non-Degeneracy Conition) We assume that Σ 12 has rank k 1 an that the minimal non-zero singular value of Σ 12 is λ min > 0 where we are working in a coorinate system where Σ 11 an Σ 22 are ientity matrices). For clarity of exposition, we also work in a isotropic coorinate system, in each view. Specifically, the expecte covariance matrix of the ata, in each view, is the ientity matrix, i.e. Σ 11 = I 1, Σ 22 = I 2 As our analysis shows, our algorithm is robust to errors, so we assume that ata is whitene as a pre-processing step. 3
6 One way to view the Non-Degeneracy Assumption is in terms of correlation coefficients. Recall that for two irections u V 1 an v V 2, the correlation coefficient is efine as: ρu, v) = E[u x 1) )v x 2) )] E[u x 1) ) 2 ]E[v x 2) ) 2 ]. An alternative efinition of λ min is just the minimal non-zero, correlation coefficient i.e. λ min = min ρu, v). u,v:ρu,v) 0 Note 1 λ min > 0. We use Σ 11 an Σ 22 to enote the sample covariance matrix in views 1 an 2 respectively. We use Σ 12 to enote the sample covariance matrix combine across views 1 an 2. We assume these are obtaine through empirical averages from i.i.. samples from the unerlying istribution. For any matrix A, we use A to enote the L 2 norm or maximum singular value of A. 2.1 A Summary of Our Results The following lemma provie the intuition for our algorithm: Lemma 1 Uner Assumption 2, if U, D, V is the thin SVD of Σ 12 where the thin SVD removes all zero entries from the iagonal), then the subspace spanne by the means in view 1 is precisely the column span of U an we have the analogous statement for view 2). This follows irectly from Equation 2 an the rank assumption. Essentially, our algorithm uses a CCA to approximately) project the ata own to the subspace spanne by the means. Our main theorem can be state as follows. Theorem 1 Gaussians) Suppose the source istribution is a mixture of Gaussians, an suppose Assumptions 1 an 2 hol. Let σ be the maximum irectional stanar eviation of any istribution in the subspace spanne by {µ 1 i }k i=1. If, for each pair i an j an for a fixe constant C, µ 1 i µ 1 j Cσ k 1/4 log kn δ ) then, with probability 1 δ, Algorithm 1 correctly classifies the examples if the number of examples use is c σ ) 2 λ 2 log 2 min w2 min σ ) log 2 1/δ) λ min w min for some constant c. Here we assume that a separation conition hols in View 1, but a similar theorem also applies to View 2. Our next theorem is for mixtures of log-concave istributions, with a ifferent separation conition, which is larger in terms of k ue to the ifferent concentration properties of log-concave istributions). Theorem 2 Log-concave Distributions) Suppose the source istribution is a mixture of logconcave istributions, an suppose Assumptions 1 an 2 hol. Let σ be the maximum irectional stanar eviation of any istribution in the subspace spanne by {µ 1 i }k i=1. If, for each pair i an j an for a fixe constant C, µ 1 i µ 1 j Cσ k log kn δ ) 4
7 then, with probability 1 δ, Algorithm 1 correctly classifies the examples if the number of examples use is c σ ) 2 λ 2 log 3 min w2 min σ ) log 2 1/δ) λ min w min for some constant c. An analogous theorem also hols for mixtures of prouct istributions, with the same k epenence as the case for mixtures of log-concave istributions. Theorem 3 Prouct Distributions) Suppose Assumptions 1 an 2 hol, an suppose that the istributions D i in the mixture are prouct istributions, in which each coorinate has range at most R. If, for each pair i an j an for a fixe constant C, µ 1 i µ 1 j CR k log kn δ ) then, with probability 1 δ, Algorithm 1 correctly classifies the examples if the number of examples use is c λ 2 min w log 2 min σ ) log 2 1/δ) λ min w min were c is a constant. 3 Clustering Algorithms In this section, we present our clustering algorithm, which clusters correctly with high probability, when the ata in at least one of the views obeys a separation conition, in aition to our assumptions. Our clustering algorithm is as follows. The input to the algorithm is a set of samples S, an a number k, an the output is a clustering of these samples into k clusters. For this algorithm, we assume that the ata obeys the separation conition in View 1; an analogous algorithm can be applie when the ata obeys the separation conition in View 2 as well. Algorithm Ranomly partition S into two subsets of equal size A an B. 2. Let Σ 12 A) Σ 12 B) respectively) enote the empirical covariance matrix between views 1 an 2, compute from the sample set A B respectively). Compute the top k 1 left singular vectors of Σ 12 A) Σ 12 B) respectively), an project the samples in B A respectively) on the subspace spanne by these vectors. 3. Apply single linkage clustering for mixtures of prouct istributions an log-concave istributions), or the algorithm in Section 3.5 of [AK05] for mixtures of Gaussians) on the projecte examples in View 1. 4 Proofs In this section, we present the proofs of our main Theorem. First, the following two lemmas are useful, whose proofs follow irectly from our assumptions. We use S 1 resp. S 2 ) to enote the subspace of V 1 resp. V 2 ) spanne by {µ 1 i }k i=1 resp. {µ2 i }k i=1 ). We use S 1 resp. S 2 ) to enote the orthogonal complement of S 1 resp. S 2 ) in V 1 resp. V 2 ). 5
8 Lemma 2 Let v 1 an v 2 be any vectors in S 1 an S 2 respectively. Then, v 1 ) T Σ 12 v 2 > λ min Lemma 3 Let v 1 resp. v 2 ) be any vector in S 1 resp. S 2 ). Then, for any u 1 V 1 an u 2 V 2, v 1 ) T Σ 12 u 2 = u 1 ) T Σ 12 v 2 = 0 Lemma 4 Sample Complexity Lemma) If the number of samples n > c ɛ 2 log 2 ) log 2 1 w min ɛw min δ ) for some constant c, then, with probability at least 1 δ, Σ 12 Σ 12 ɛ where enotes the L 2 -norm of a matrix equivalent to the maximum singular value). Proof: To prove this lemma, we apply Lemma 5. Observe the block representation of Σ in Equation 1. Moreover, with Σ 11 an Σ 22 in isotropic position, we have that the L 2 norm of Σ 12 is at most 1. Using the triangle inequality, we can write: Σ 12 Σ Σ Σ + Σ 11 Σ 11 + Σ 22 Σ 22 ) where we have applie the triangle inequality to the 2x2 block matrix with off-iagonal entries Σ 12 Σ 12 an with 0 iagonal entries). We now apply Lemma 5 three times, on Σ 11 Σ 11, on Σ 22 Σ 22 an a scale version of Σ Σ. The first two applications follow irectly. For the thir application, we observe that Lemma 5 is rotation invariant, an that scaling each covariance value by some factor s scales the norm of the matrix by at most s. We claim that we can apply Lemma 5 on Σ Σ with s = 4. Since the covariance of any two ranom variables is at most the prouct of their stanar eviations, an since Σ 11 an Σ 22 are I 1 an I 2 respectively, the maximum singular value of Σ 12 is at most 1; the maximum singular value of Σ is therefore at most 4. Our claim follows. The lemma now follows by plugging in n as a function of ɛ, an w min Lemma 5 Let X be a set of n points generate by a mixture of k Gaussians over R, scale such that E[x x T ] = I. If M is the sample covariance matrix of X, then, for n large enough, with probability at least 1 δ, log n log 2n δ M E[M] C ) log1/δ) wmin n where C is any constant, an w min is the minimum mixing weight of any Gaussian in the mixture. Proof: To prove this lemma, we use a concentration result on the L 2 -norms of matrices ue to [RV07]. We observe that each vector x i in the scale space is generate by a Gaussian with some mean µ an maximum irectional variance σ 2. As the total variance of the mixture along any irection is at most 1, w min µ 2 + σ 2 ) 1 3) 6
9 Therefore, for all samples x i, with probability at least 1 δ/2, x i µ + σ log 2n δ ) 4) We conition on the fact that the event x i µ + σ log 2n δ ) happens for all i = 1,..., n. The probability of this event is at least 1 δ/2. Conitione on this event, the istributions of the vectors x i are inepenent. Therefore, we can apply Theorem 3.1 in [RV07] on these conitional istributions, to conclue that: Pr[ M E[M] > t] 2e cnt2 /Λ 2 log n where c is a constant, an Λ is an upper boun on the norm of any vector x i. Plugging in Λ t = 2 log4/δ) log n cn, we get that conitione on the event x i Λ, with probability 1 δ/2, Λ M E[M] 2 log4/δ) log n cn. From Equations 3 an 4, we get that with probability 1 δ/2, for all the samples, Λ 2 log2n/δ) wmin. Therefore, with probability 1 δ, which completes the proof. M E[M] C log n log2n/δ) log4/δ) nwmin Lemma 6 Projection Subspace Lemma) Let v 1 resp. v 2 ) be any vector in S 1 resp. S 2 ). If the number of samples n > c τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1 δ ) for some constant c, then, with probability 1 δ, the length of the projection of v 1 resp. v 2 ) in the subspace spanne by the top k 1 left resp. right) singular vectors of Σ 12 is at least 1 τ 2 v 1 resp. 1 τ 2 v 2 ). Proof: For the sake of contraiction, suppose there exists a vector v 1 S 1 such that the projection of v 1 on the top k 1 left singular vectors of Σ 12 is equal to 1 τ 2 v 1, where τ > τ. Then, there exists some unit vector u 1 in V 1 in the orthogonal complement of the space spanne by the top k 1 left singular vectors of Σ 12 such that the projection of v 1 on u 1 is equal to τ v 1. Since the projection of v 1 on u 1 is at least τ, an u 1 an v 1 are both unit vectors, u 1 can be written as: u 1 = τv 1 +1 τ 2 ) 1/2 y 1, where y 1 is in the orthogonal complement of S 1. From Lemma 2, there exists some vector u 2 in S 2, such that v 1 ) Σ 12 u 2 λ min ; from Lemma 3, as y 1 is in the orthogonal complement of S 1, for this vector u 2, u 1 ) Σ 12 u 2 τλ min. If n > c τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1 δ ), for some constant c, then, from Lemma 7, u 1 ) T Σ12 u 2 τ 2 λ min. Now, since u 1 is in the orthogonal complement of the subspace spanne by the top k 1 left singular vectors of Σ 12, for any vector y 2 in the subspace spanne by the top k 1 right singular vectors of Σ 12, u 1 ) Σ12 y 2 = 0. This follows from the properties of the singular space of any matrix. This, in turn, means that there exists a vector z 2 in V 2 in the orthogonal complement of the subspace spanne by the top k 1 right singular vectors of Σ 12 such that u 1 ) T Σ12 z 2 τ 2 λ min. This implies that the k-th singular value of Σ 12 is at least τ 2 λ min. However, if n > c τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1 δ ), for some constant c, then, from Lemma 7, all except the top k 1 singular values of Σ 12 are at most τ 3 λ min, which is a contraiction. Lemma 7 Let n > C ɛ 2 w min log 2 ɛw min ) log 2 1 δ ), for some constant C. Then, with probability 1 δ, the top k 1 singular values of Σ 12 have value at least λ min ɛ. The remaining min 1, 2 ) k + 1 singular values of Σ 12 have value at most ɛ. 7
10 Proof: From Lemmas 2 an 3, Σ 12 has rank exactly k 1, an the k 1-th singular value of Σ 12 is at least λ min. Let e 1,..., e k 1 an g 1,..., g k 1 be the top k 1 left an right singular vectors of Σ 12. Then, using Lemma 2, for any vectors e an g in the subspaces spanne by e 1,..., e k 1 an g 1,..., g k 1 respectively, e T Σ12 g e T Σ 12 g e T Σ 12 Σ 12 )g λ min ɛ. Therefore, there is a subspace of rank k 1 with singular values at least λ min ɛ from which the first part of the lemma follows. The secon part of the lemma follows similarly from the fact that Σ 12 has a min 1, 2 ) k + 1 imensional subspace with singular value 0. Now we are reay to prove our main theorem. Proof:Of Theorem 1) From Lemma 6, if n > C τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1/δ), for some constant C, then, with probability at least 1 δ, for any vector v in the subspace containing the centers, the projection of v onto the subspace returne by Step 2 of Algorithm 1 has length at least 1 τ 2 v. Therefore, the irectional maximum variance of any istribution D i in the mixture along any irection in this subspace is at most 1 τ 2 )σ ) 2 + τ 2 σ 2, where σ 2 is the maximum irectional variance of any istribution D i in the mixture. When τ σ /σ, this variance is at most 2σ ) 2. Since the irectional variance of the entire mixture is 1 in any irection, w min σ 2 1 which means that σ 1 wmin. Therefore, when n > C log 2 σ ) 2 λ 2 min w2 σ min λ minw min ) log 2 1/δ), for some constant C, the maximum irectional variance of any istribution D i in the mixture in the space output by Step 2 of the Algorithm is at most 2σ ) 2. Since A an B are ranom partitions of the sample set S, the subspace prouce by the action of Step 2 of Algorithm 1 on the set A is inepenent of B, an vice versa. Therefore, when projecte onto the top k 1 singular value ecomposition subspace of Σ 12 A), the samples from B are istribute as a mixture of k 1)-imensional Gaussians. From the previous paragraph, in this subspace, the separation between the centers of any two istributions in the mixture is at least c σ k 1/4 log kn δ ), for some constant c, an the maximum irectional stanar eviation of any istribution in the mixture is at most 2σ. The theorem now follows from Theorem 1 of [AK05]. Similar theorems with slightly worse bouns hols when the istributions in the mixture are not Gaussian, but possess certain istance concentration properties. The following lemma is useful for the proof of Theorem 2. Lemma 8 Let D i be a log-concave istribution over R, an let S be a fixe r-imensional subspace, such that the maximum irectional variance of D i along S is at most σ. If x is a point sample from D i, an if P S x) enotes the projection of x onto S, then, with probability 1 δ, P S x µ 1 i ) σ r log r δ ) Proof: From Lemma 2 in [KSV05], if v is a unit vector, an D i is a log-concave istribution, then, for x sample from D i, Pr[ v x µ 1 i ) > σ t] e t Let v 1,..., v r be a basis of S. Plugging in t = log r δ ), for all v l in this basis, with probability 1 δ r, v l x µ1 i ) σ log r δ ). The lemma now follows by observing that x µ1 i 2 = l v l x µ1 i ) 2. Now the proof of Theorem 2 follows. Proof:Of Theorem 2) We can use Lemma 8 along with an argument similar to the first part of the proof in Theorem 1 to show that if the number of samples n > c σ ) 2 λ 2 log 3 min w2 min σ ) log 2 1/δ) λ min w min 8
11 then, one can fin a subspace such that for any vector v in the subspace containing the centers, 1) the projection of v onto this subspace has norm at least c. v, where c is a constant, an 2) the maximum irectional stanar eviation of any D i in the mixture along v is at most 2σ. We now apply Lemma 8 on the subspace returne by our Algorithm in View 1. We remark that we can o this, because, ue to the inepenence between the sample sets A an B, the top k 1 singular value ecomposition subspace of the covariance matrix of A across the views is inepenent of B an vice versa. Applying this Lemma an a triangle inequality, if D i is a log-concave istribution such that the maximum irectional variance of D i in S 1 is σ, then, the istance between all pairs of points from D i when projecte onto the k 1)-imensional subspace returne by Step 2 of Algorithm 1 is at most Oσ k log nk δ )) with probability at least 1 δ. This statement implies that single linkage will succee ue to that the interclass istances are larger than the intraclass istances), which conclues the proof of the theorem. Finally, we prove Theorem 3. The following lemma is useful for the proof of Theorem 3. Lemma 9 Let D i be a prouct istribution over R, in which each coorinate has range R, an let S be a fixe r-imensional subspace. Let x be a point rawn from D i, an let P S x) be the projection of x onto S. Then, with probability at least 1 δ, P S x µ 1 i ) R 2r log 2r δ ) Proof: Let v 1,..., v r be an orthonormal basis of S. For any vector v l, an any x rawn from istribution D i, Pr[ vl x µ 1 i ) > Rt] 2e t2 /2 Plugging in t = 2 log 2r δ ), we get that with probability at least 1 δ r, v l x µ1 i ) R 2 log 2r δ ). The rest follows by the observation that in the subspace returne by our algorithm, x µ 1 i 2 = r l=1 v l x µ1 i ))2, an a union boun over all r vectors in this basis. Proof: Of Theorem 3) Since the istribution of each coorinate has range R, an the coorinates are istribute inepenently, for any istribution D i in the mixture, the maximum irectional variance is at most R. We can now use Lemma 9 an an argument similar to the first part of the proof in Theorem 1 to show that if the number of samples n > c λ 2 log2 min wmin σ λ minw min ) log 2 1/δ), then, one can fin a subspace such that for any vector v in the subspace containing the centers, the projection of v onto this subspace has norm at least c. v, where c is a constant. We now apply Lemma 9 on the subspace returne by our Algorithm in View 1. We remark that we can o this, because, ue to the inepenence between the sample sets A an B, the top k 1 singular value ecomposition subspace of the covariance matrix of A across the views, is inepenent of B an vice versa. Applying Lemma 9 an a triangle inequality, if D i is a prouct istribution where each coorinate has range R, then, the istance between all pairs of points from D i when projecte onto a specific k 1)-imensional subspace is at most OR k log nk δ )) with probability at least 1 δ. This statement implies that single linkage will succee ue to that the interclass istances are larger than the intraclass istances), which conclues the proof of the theorem. References [AK05] S. Arora an R. Kannan. Learning mixtures of separate nonspherical gaussians. Annals of Applie Probability, 151A):69 92,
12 [AM05] D. Achlioptas an F. McSherry. On spectral learning of mixtures of istributions. In Proceeings of the 18th Annual Conference on Learning Theory, pages , [AZ07] [BM98] [BV08] [CR08] [Das99] Rie Kubota Ano an Tong Zhang. Two-view feature generation moel for semi-supervise learning. In ICML 07: Proceeings of the 24th international conference on Machine learning, pages 25 32, New York, NY, USA, ACM. Avrim Blum an Tom Mitchell. Combining labele an unlabele ata with co-training. In COLT: Proceeings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, pages , S. C. Brubaker an S. Vempala. Isotropic pca an affine-invariant clustering. In Proc. of Founations of Computer Science, K. Chauhuri an S. Rao. Learning mixtures of istributions using correlations an inepenence. In In Proc. of Conference on Learning Theory, S. Dasgupta. Learning mixtures of gaussians. In Proceeings of the 40th IEEE Symposium on Founations of Computer S cience, pages , [DS00] S. Dasgupta an L. Schulman. A two-roun variant of em for gaussian mixtures. In Sixteenth Conference on Uncertainty in Artificial Intelligence UAI), [KF07] Sham M. Kakae an Dean P. Foster. Multi-view regression via canonical correlation analysis. In Naer H. Bshouty an Clauio Gentile, eitors, COLT, volume 4539 of Lecture Notes in Computer Science, pages Springer, [KSV05] R. Kannan, H. Salmasian, an S. Vempala. The spectral metho for general mixture moels. In Proceeings of the 18th Annual Conference on Learning Theory, [RV07] M. Ruelson an R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, [VW02] V. Vempala an G. Wang. A spectral algorithm of learning mixtures of istributions. In Proceeings of the 43r IEEE Symposium on Founations of Computer Science, pages ,
Multi-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationMulti-View Dimensionality Reduction via Canonical Correlation Analysis
Technical Report TTI-TR-2008-4 Multi-View Dimensionality Reduction via Canonical Correlation Analysis Dean P. Foster University of Pennsylvania Sham M. Kakade Toyota Technological Institute at Chicago
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationLecture 6 : Dimensionality Reduction
CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More informationDatabase-friendly Random Projections
Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationLearning Mixtures of Gaussians with Maximum-a-posteriori Oracle
Satyaki Mahalanabis Dept of Computer Science University of Rochester smahalan@csrochestereu Abstract We consier the problem of estimating the parameters of a mixture of istributions, where each component
More informationarxiv: v1 [cs.lg] 22 Mar 2014
CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com
More informationPermanent vs. Determinant
Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationThe Informativeness of k-means for Learning Mixture Models
The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More informationarxiv: v1 [math.mg] 10 Apr 2018
ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationA Unified Theorem on SDP Rank Reduction
A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in
More informationIsotropic PCA and Affine-Invariant Clustering
Isotropic PCA and Affine-Invariant Clustering (Extended Abstract) S. Charles Brubaker Santosh S. Vempala Georgia Institute of Technology Atlanta, GA 30332 {brubaker,vempala}@cc.gatech.edu Abstract We present
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationApproximate Constraint Satisfaction Requires Large LP Relaxations
Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationOn combinatorial approaches to compressed sensing
On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationImproving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers
International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationSpeaker Adaptation Based on Sparse and Low-rank Eigenphone Matrix Estimation
INTERSPEECH 2014 Speaker Aaptation Base on Sparse an Low-rank Eigenphone Matrix Estimation Wen-Lin Zhang 1, Dan Qu 1, Wei-Qiang Zhang 2, Bi-Cheng Li 1 1 Zhengzhou Information Science an Technology Institute,
More informationLecture 5. Symmetric Shearer s Lemma
Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationAlgorithms and matching lower bounds for approximately-convex optimization
Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department
More informationModeling of Dependence Structures in Risk Management and Solvency
Moeling of Depenence Structures in Risk Management an Solvency University of California, Santa Barbara 0. August 007 Doreen Straßburger Structure. Risk Measurement uner Solvency II. Copulas 3. Depenent
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationarxiv: v2 [physics.data-an] 5 Jul 2012
Submitte to the Annals of Statistics OPTIMAL TAGET PLAE RECOVERY ROM OISY MAIOLD SAMPLES arxiv:.460v physics.ata-an] 5 Jul 0 By Daniel. Kaslovsky an rançois G. Meyer University of Colorao, Bouler Constructing
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationELEC3114 Control Systems 1
ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationunder the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.
Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the
More information15.1 Upper bound via Sudakov minorization
ECE598: Information-theoretic methos in high-imensional statistics Spring 206 Lecture 5: Suakov, Maurey, an uality of metric entropy Lecturer: Yihong Wu Scribe: Aolin Xu, Mar 7, 206 [E. Mar 24] In this
More informationOn colour-blind distinguishing colour pallets in regular graphs
J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article
More informationChromatic number for a generalization of Cartesian product graphs
Chromatic number for a generalization of Cartesian prouct graphs Daniel Král Douglas B. West Abstract Let G be a class of graphs. The -fol gri over G, enote G, is the family of graphs obtaine from -imensional
More informationSelf-normalized Martingale Tail Inequality
Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value
More informationNew Statistical Test for Quality Control in High Dimension Data Set
International Journal of Applie Engineering Research ISSN 973-456 Volume, Number 6 (7) pp. 64-649 New Statistical Test for Quality Control in High Dimension Data Set Shamshuritawati Sharif, Suzilah Ismail
More informationGeneralized Nonhomogeneous Abstract Degenerate Cauchy Problem
Applie Mathematical Sciences, Vol. 7, 213, no. 49, 2441-2453 HIKARI Lt, www.m-hikari.com Generalize Nonhomogeneous Abstract Degenerate Cauchy Problem Susilo Hariyanto Department of Mathematics Gajah Maa
More informationSpurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009
Spurious Significance of reatment Effects in Overfitte Fixe Effect Moels Albrecht Ritschl LSE an CEPR March 2009 Introuction Evaluating subsample means across groups an time perios is common in panel stuies
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationd-dimensional Arrangement Revisited
-Dimensional Arrangement Revisite Daniel Rotter Jens Vygen Research Institute for Discrete Mathematics University of Bonn Revise version: April 5, 013 Abstract We revisit the -imensional arrangement problem
More informationCHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold
CHAPTER 1 : DIFFERENTIABLE MANIFOLDS 1.1 The efinition of a ifferentiable manifol Let M be a topological space. This means that we have a family Ω of open sets efine on M. These satisfy (1), M Ω (2) the
More informationFinding Similar Users in Social Networks
heory of Computing Systems manuscript No. will be inserte by the eitor) Fining Similar Users in Social Networks Aviv Nisgav Boaz Patt-Shamir Receive: ate / Accepte: ate Abstract We consier a system where
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationPractical Analysis of Key Recovery Attack against Search-LWE Problem
Practical Analysis of Key Recovery Attack against Search-LWE Problem Royal Holloway an Kyushu University Workshop on Lattice-base cryptography 7 th September, 2016 Momonari Kuo Grauate School of Mathematics,
More informationGeneralizing Kronecker Graphs in order to Model Searchable Networks
Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationBaum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions
Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang
More informationA Weak First Digit Law for a Class of Sequences
International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of
More informationPractical Analysis of Key Recovery Attack against Search-LWE Problem
Practical Analysis of Key Recovery Attack against Search-LWE Problem IMI Cryptography Seminar 28 th June, 2016 Speaker* : Momonari Kuo Grauate School of Mathematics, Kyushu University * This work is a
More informationHomework 3 - Solutions
Homework 3 - Solutions The Transpose an Partial Transpose. 1 Let { 1, 2,, } be an orthonormal basis for C. The transpose map efine with respect to this basis is a superoperator Γ that acts on an operator
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationLinear Algebra- Review And Beyond. Lecture 3
Linear Algebra- Review An Beyon Lecture 3 This lecture gives a wie range of materials relate to matrix. Matrix is the core of linear algebra, an it s useful in many other fiels. 1 Matrix Matrix is the
More informationCollapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling
Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example
More informationTEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE
TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,
More informationDissertation Defense
Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationChapter 2 Lagrangian Modeling
Chapter 2 Lagrangian Moeling The basic laws of physics are use to moel every system whether it is electrical, mechanical, hyraulic, or any other energy omain. In mechanics, Newton s laws of motion provie
More informationSYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is
SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. Uniqueness for solutions of ifferential equations. We consier the system of ifferential equations given by x = v( x), () t with a given initial conition
More informationHomework 2 Solutions EM, Mixture Models, PCA, Dualitys
Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture
More informationOn Decentralized Optimal Control and Information Structures
2008 American Control Conference Westin Seattle Hotel, Seattle, Washington, USA June 11-13, 2008 FrC053 On Decentralize Optimal Control an Information Structures Naer Motee 1, Ali Jababaie 1 an Bassam
More informationMonotonicity of facet numbers of random convex hulls
Monotonicity of facet numbers of ranom convex hulls Gilles Bonnet, Julian Grote, Daniel Temesvari, Christoph Thäle, Nicola Turchi an Florian Wespi arxiv:173.31v1 [math.mg] 7 Mar 17 Abstract Let X 1,...,
More informationComputational Lower Bounds for Statistical Estimation Problems
Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK
More informationPseudo-Free Families of Finite Computational Elementary Abelian p-groups
Pseuo-Free Families of Finite Computational Elementary Abelian p-groups Mikhail Anokhin Information Security Institute, Lomonosov University, Moscow, Russia anokhin@mccme.ru Abstract We initiate the stuy
More informationInverse Theory Course: LTU Kiruna. Day 1
Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not
More information