Multi-View Clustering via Canonical Correlation Analysis
|
|
- Phillip Crawford
- 6 years ago
- Views:
Transcription
1 Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL Abstract Clustering ata in high imensions is believe to be a har problem in general. A number of efficient clustering algorithms evelope in recent years aress this problem by projecting the ata into a lowerimensional subspace, e.g. via Principal Components Analysis PCA) or ranom projections, before clustering. Here, we consier constructing such projections using multiple views of the ata, via Canonical Correlation Analysis CCA). Uner the assumption that the views are uncorrelate given the cluster label, we show that the separation conitions require for the algorithm to be successful are significantly weaker than prior results in the literature. We provie results for mixtures of Gaussians an mixtures of log concave istributions. We also provie empirical support from auio-visual speaker clustering where we esire the clusters to correspon to speaker ID) an from hierarchical Wikipeia ocument clustering where one view is the wors in the ocument an the other is the link structure). 1. Introuction The multi-view approach to learning is one in which we have views of the ata sometimes in a rather abstract sense) an the goal is to use the relationship between these views to alleviate the ifficulty of a learning problem of interest Blum & Mitchell, 1998; Kakae & Foster, 07; Ano & Zhang, 07). In this Appearing in Proceeings of the 26 th International Conference on Machine Learning, Montreal, Canaa, 09. Copyright 09 by the authors)/owners). work, we explore how having two views makes the clustering problem significantly more tractable. Much recent work has been one on unerstaning uner what conitions we can learn a mixture moel. The basic problem is as follows: We are given inepenent samples from a mixture of k istributions, an our task is to either: 1) infer properties of the unerlying mixture moel e.g. the mixing weights, means, etc.) or 2) classify a ranom sample accoring to which istribution in the mixture it was generate from. Uner no restrictions on the unerlying mixture, this problem is consiere to be har. However, in many applications, we are only intereste in clustering the ata when the component istributions are well separate. In fact, the focus of recent clustering algorithms Dasgupta, 1999; Vempala & Wang, 02; Achlioptas & McSherry, 05; Brubaker & Vempala, 08) is on efficiently learning with as little separation as possible. Typically, the separation conitions are such that when given a ranom sample from the mixture moel, the Bayes optimal classifier is able to reliably recover which cluster generate that point. This work makes a natural multi-view assumption: that the views are conitionally) uncorrelate, conitione on which mixture component generate the views. There are many natural applications for which this assumption applies. For example, we can consier multi-moal views, with one view being a vieo stream an the other an auio stream, of a speaker here, conitione on the speaker ientity an maybe the phoneme both of which coul label the generating cluster), the views may be uncorrelate. A secon example is the wors an link structure in a ocument from a corpus such as Wikipeia here, conitione on the category of each ocument, the wors in it an its link structure may be uncorrelate. In this paper, we provie experiments for both settings.
2 Uner this multi-view assumption, we provie a simple an efficient subspace learning metho, base on Canonical Correlation Analysis CCA). This algorithm is affine invariant an is able to learn with some of the weakest separation conitions to ate. The intuitive reason for this is that uner our multi-view assumption, we are able to approximately) fin the low-imensional subspace spanne by the means of the component istributions. This subspace is important, because, when projecte onto this subspace, the means of the istributions are well-separate, yet the typical istance between points from the same istribution is smaller than in the original space. The number of samples we require to cluster correctly scales as O), where is the ambient imension. Finally, we show through experiments that CCA-base algorithms consistently provie better performance than stanar PCA-base clustering methos when applie to atasets in the two quite ifferent omains of auiovisual speaker clustering an hierarchical Wikipeia ocument clustering by category. Our work as to the growing boy of results which show how the multi-view framework can alleviate the ifficulty of learning problems. Relate Work. Most provably efficient clustering algorithms first project the ata own to some lowimensional space an then cluster the ata in this lower imensional space an algorithm such as single linkage usually suffices here). Typically, these algorithms also work uner a separation requirement, which is measure by the minimum istance between the means of any two mixture components. One of the first provably efficient algorithms for learning mixture moels is ue to Dasgupta, 1999), who learns a mixture of spherical Gaussians by ranomly projecting the mixture onto a low-imensional subspace. Vempala & Wang, 02) provie an algorithm with an improve separation requirement that learns a mixture of k spherical Gaussians, by projecting the mixture own to the k-imensional subspace of highest variance. Kannan et al., 05; Achlioptas & McSherry, 05) exten this result to mixtures of general Gaussians; however, they require a separation proportional to the maximum irectional stanar eviation of any mixture component. Chauhuri & Rao, 08) use a canonical correlations-base algorithm to learn mixtures of axis-aligne Gaussians with a separation proportional to σ, the maximum irectional stanar eviation in the subspace containing the means of the istributions. Their algorithm requires a coorinateinepenence property, an an aitional spreaing conition. None of these algorithms are affine invariant. Finally, Brubaker & Vempala, 08) provie an affineinvariant algorithm for learning mixtures of general Gaussians, so long as the mixture has a suitably low Fisher coefficient when in isotropic position. However, their separation involves a large polynomial epenence on 1 w min. The two results most closely relate to ours are the work of Vempala & Wang, 02) an Chauhuri & Rao, 08). Vempala & Wang, 02) show that it is sufficient to fin the subspace spanne by the means of the istributions in the mixture for effective clustering. Like our algorithm, Chauhuri & Rao, 08) use a projection onto the top k 1 singular value ecomposition subspace of the canonical correlations matrix. They also require a spreaing conition, which is relate to our requirement on the rank. We borrow techniques from both of these papers. Blaschko & Lampert, 08) propose a similar algorithm for multi-view clustering, in which ata is projecte onto the top irections obtaine by kernel CCA across the views. They show empirically that for clustering images using the associate text as a secon view where the target clustering is a human-efine category), CCA-base clustering methos out-perform PCA-base algorithms. This Work. Our input is ata on a fixe set of objects from two views, where View j is assume to be generate by a mixture of k Gaussians D j 1,..., Dj k ), for j = 1, 2. To generate a sample, a source i is picke with probability w i, an x 1) an x 2) in Views 1 an 2 are rawn from istributions D 1 i an D2 i. Following prior theoretical work, our goal is to show that our algorithm recovers the correct clustering, provie the input mixture obeys certain conitons. We impose two requirements on these mixtures. First, we require that conitione on the source, the two views are uncorrelate. Notice that this is a weaker restriction than the conition that given source i, the samples from D 1 i an D 2 i are rawn inepenently. Moreover, this conition allows the istributions in the mixture within each view to be completely general, so long as they are uncorrelate across views. Although we o not prove this, our algorithm seems robust to small eviations from this assumption. Secon, we require the rank of the CCA matrix across the views to be at least k 1, when each view is in isotropic position, an the k 1-th singular value of this matrix to be at least λ min. This conition ensures that there is sufficient correlation between the views. If the first two conitions hol, then we can recover the subspace containing the means in both views.
3 In aition, for mixtures of Gaussians, if in at least one view, say View 1, we have that for every pair of istributions i an j in the mixture, µ 1 i µ 1 j > Cσ k 1/4 logn/δ) for some constant C, then our algorithm can also etermine which component each sample came from. Here µ 1 i is the mean of the i-th component in View 1 an σ is the maximum irectional stanar eviation in the subspace containing the means in View 1. Moreover, the number of samples require to learn this mixture grows almost) linearly with. This separation conition is consierably weaker than previous results in that σ only epens on the irectional variance in the subspace spanne by the means, which can be consierably lower than the maximum irectional variance over all irections. The only other algorithm which provies affine-invariant guarantees is ue to Brubaker & Vempala, 08) the implie separation in their work is rather large an grows with ecreasing w min, the minimum mixing weight. To get our improve sample complexity bouns, we use a result ue to Ruelson & Vershynin, 07) which may be of inepenent interest. We stress that our improve results are really ue to the multi-view conition. Ha we simply combine the ata from both views, an applie previous algorithms on the combine ata, we coul not have obtaine our guarantees. We also emphasize that for our algorithm to cluster successfully, it is sufficient for the istributions in the mixture to obey the separation conition in one view, so long as the multi-view an rank conitions are obeye. Finally, we stuy through experiments the performance of CCA-base algorithms on ata sets from two ifferent omains. First, we experiment with auiovisual speaker clustering, in which the two views are auio an face images of a speaker, an the target cluster variable is the speaker. Our experiments show that CCA-base algorithms perform better than PCAbase algorithms on auio ata an just as well on image ata, an are more robust to occlusions of the images. For our secon experiment, we cluster ocuments in Wikipeia. The two views are the wors an the link structure in a ocument, an the target cluster is the category. Our experiments show that a CCAbase hierarchical clustering algorithm out-performs PCA-base hierarchical clustering for this ata. 2. The Setting We assume that our ata is generate by a mixture of k istributions. In particular, we assume that we obtain samples x = x 1), x 2) ), where x 1) an x 2) are the two views, which live in the vector spaces V 1 of imension 1 an V 2 of imension 2, respectively. We let = Let µ j i, for i = 1,..., k an j = 1, 2, be the mean of istribution i in view j, an let w i be the mixing weight for istribution i. For simplicity, we assume that the ata have mean 0. We enote the covariance matrix of the ata as: Σ = E[xx ], Σ 11 = E[x 1) x 1) ) ] Σ 22 = E[x 2) x 2) ) ], Σ 12 = E[x 1) x 2) ) [ ] Σ11 Σ Hence, we have: Σ = 21 1) Σ 12 Σ 22 The multi-view assumption we work with is as follows: Assumption 1 Multi-View Conition) We assume that conitione on the source istribution l in the mixture where l = i is picke with probability w i ), the two views are uncorrelate. More precisely, we assume that for all i [k], E[x 1) x 2) ) l = i] = E[x 1) l = i]e[x 2) ) l = i] This assumption implies that: Σ 12 = i w iµ 1 i µ2 i )T. To see this, observe that E[x 1) x 2) ) ] = i = i = i E Di [x 1) x 2) ) ] Pr[D i ] w i E Di [x 1) ] E Di [x 2) ) ] w i µ 1 i µ 2 i ) T 2) As the istributions are in isotropic position, we observe that i w iµ 1 i = i w iµ 2 i = 0. Therefore, the above equation shows that the rank of Σ 12 is at most k 1. We now assume that it has rank precisely k 1. Assumption 2 Non-Degeneracy Conition) We assume that Σ 12 has rank k 1 an that the minimal non-zero singular value of Σ 12 is λ min > 0 where we are working in a coorinate system where Σ 11 an Σ 22 are ientity matrices). For clarity of exposition, we also work in an isotropic coorinate system in each view. Specifically, the expecte covariance matrix of the ata, in each view, is the ientity matrix, i.e. Σ 11 = I 1, Σ 22 = I 2. As our analysis shows, our algorithm is robust to errors, so we assume that ata is whitene as a preprocessing step. One way to view the Non-Degeneracy Assumption is in terms of correlation coefficients. Recall that for two
4 irections u V 1 an v V 2, the correlation coefficient is efine as: ρu, v) = E[u x 1) )v x 2) )] E[u x 1) ) 2 ]E[v x 2) ) 2 ]. An alternative efinition of λ min is the minimal non-zero correlation coefficient, λ min = min u,v:ρu,v) 0 ρu, v). Note 1 λ min > 0. We use Σ 11 an Σ 22 to enote the sample covariance matrices in views 1 an 2 respectively. We use Σ 12 to enote the sample covariance matrix combine across views 1 an 2. We assume these are obtaine through empirical averages from i.i.. samples from the unerlying istribution. 3. The Clustering Algorithm The following lemma provies the intuition for our algorithm. Lemma 1 Uner Assumption 2, if U, D, V is the thin SVD of Σ 12 where the thin SVD removes all zero entries from the iagonal), then the subspace spanne by the means in view 1 is precisely the column span of U an we have the analogous statement for view 2). The lemma is a consequence of Equation 2 an the rank assumption. Since samples from a mixture are well-separate in the space containing the means of the istributions, the lemma suggests the following strategy: use CCA to approximately) project the ata own to the subspace spanne by the means to get an easier clustering problem, an then apply stanar clustering algorithms in this space. Our clustering algorithm, base on the above iea, is state below. We can show that this algorithm clusters correctly with high probability, when the ata in at least one of the views obeys a separation conition, in aition to our assumptions. The input to the algorithm is a set of samples S, an a number k, an the output is a clustering of these samples into k clusters. For this algorithm, we assume that the ata obeys the separation conition in View 1; an analogous algorithm can be applie when the ata obeys the separation conition in View 2 as well. Algorithm Ranomly partition S into two subsets A an B of equal size. 2. Let Σ 12 A) Σ 12 B) resp.) enote the empirical covariance matrix between views 1 an 2, compute from the sample set A B resp.). Compute the top k 1 left singular vectors of Σ 12 A) Σ 12 B) resp.), an project the samples in B A resp.) on the subspace spanne by these vectors. 3. Apply single linkage clustering Dunn & Everitt, 04) for mixtures of log-concave istributions), or the algorithm in Section 3.5 of Arora & Kannan, 05) for mixtures of Gaussians) on the projecte examples in View 1. We note that in Step 3, we apply either single linkage or the algorithm of Arora & Kannan, 05); this allows us to show theoretically that if the istributions in the mixture are of a certain type, an given the right separation conitions, the clusters can be recovere correctly. In practice, however, these algorithms o not perform as well ue to lack of robustness, an one woul use an algorithm such as k-means or EM to cluster in this low-imensional subspace. In particular, a variant of the EM algorithm has been shown Dasgupta & Schulman, 00) to cluster correctly mixtures of Gaussians, uner certain conitions. Moreover, in Step 1, we ivie the ata set into two halves to ensure inepenence between Steps 2 an 3 for our analysis; in practice, however, these steps can be execute on the same sample set. Main Results. Our main theorem is as follows. Theorem 1 Gaussians) Suppose the source istribution is a mixture of Gaussians, an suppose Assumptions 1 an 2 hol. Let σ be the maximum irectional stanar eviation of any istribution in the subspace spanne by {µ 1 i }k i=1. If, for each pair i an j an for a fixe constant C, µ 1 i µ 1 j Cσ k 1/4 log kn δ ) then, with probability 1 δ, Algorithm 1 correctly classifies the examples if the number of examples use is c σ ) 2 λ 2 log 2 min w2 min σ ) log 2 1/δ) λ min w min for some constant c. Here we assume that a separation conition hols in View 1, but a similar theorem also applies to View 2. An analogous theorem can also be shown for mixtures of log-concave istributions. Theorem 2 Log-concave Distributions) Suppose the source istribution is a mixture of
5 log-concave istributions, an suppose Assumptions 1 an 2 hol. Let σ be the maximum irectional stanar eviation of any istribution in the subspace spanne by {µ 1 i }k i=1. If, for each pair i an j an for a fixe constant C, µ 1 i µ 1 j Cσ k log kn δ ) then, with probability 1 δ, Algorithm 1 correctly classifies the examples if the number of examples use is c σ ) 2 λ 2 log 3 min w2 min σ ) log 2 1/δ) λ min w min for some constant c. The proof follows from the proof of Theorem 1, along with stanar results on log-concave probability istributions see Kannan et al., 05; Achlioptas & McSherry, 05). We o not provie a proof here ue to space constraints. 4. Analyzing Our Algorithm In this section, we prove our main theorems. Notation. In the sequel, we assume that we are given samples from a mixture which obeys Assumptions 2 an 1. We use the notation S 1 resp. S 2 ) to enote the subspace containing the centers of the istributions in the mixture in View 1 resp. View 2), an notation S 1 resp. S 2 ) to enote the orthogonal complement to the subspace containing the centers of the istributions in the mixture in View 1 resp. View 2). For any matrix A, we use A to enote the L 2 norm or maximum singular value of A. Proofs. Now, we are reay to prove our main theorem. First, we show the following two lemmas, which emonstrate properties of the expecte crosscorrelational matrix across the views. Their proofs are immeiate from Assumptions 2 an 1. Lemma 2 Let v 1 an v 2 be any vectors in S 1 an S 2 respectively. Then, v 1 ) T Σ 12 v 2 > λ min. Lemma 3 Let v 1 resp. v 2 ) be any vector in S 1 resp. S 2 ). Then, for any u 1 V 1 an u 2 V 2, v 1 ) T Σ 12 u 2 = u 1 ) T Σ 12 v 2 = 0. Next, we show that given sufficiently many samples, the subspace spanne by the top k 1 singular vectors of Σ 12 still approximates the subspace containing the means of the istributions comprising the mixture. Finally, we use this fact, along with some results in Arora & Kannan, 05) to prove Theorem 1. Our main lemma of this section is the following. Lemma 4 Projection Subspace Lemma) Let v 1 resp. v 2 ) be any vector in S 1 resp. S 2 ). If the number of samples n > c τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1 δ ) for some constant c, then, with probability 1 δ, the length of the projection of v 1 resp. v 2 ) in the subspace spanne by the top k 1 left resp. right) singular vectors of Σ12 is at least 1 τ 2 v 1 resp. 1 τ 2 v 2 ). The main tool in the proof of Lemma 4 is the following lemma, which uses a result ue to Ruelson & Vershynin, 07). Lemma 5 Sample Complexity Lemma) If number of samples n > c ɛ 2 log 2 ) log 2 1 w min ɛw min δ ) the for some constant c, then, with probability at least 1 δ, Σ 12 Σ 12 ɛ. A consequence of Lemmas 5, 2 an 3 is the following. Lemma 6 Let n > C ɛ 2 w min log 2 ɛw min ) log 2 1 δ ), for some constant C. Then, with probability 1 δ, the top k 1 singular values of Σ 12 have value at least λ min ɛ. The remaining min 1, 2 ) k + 1 singular values of Σ 12 have value at most ɛ. The proof follows by a combination of Lemmas 2,3, 5. Proof:Of Lemma 5) To prove this lemma, we apply Lemma 7. Observe the block representation of Σ in Equation 1. Moreover, with Σ 11 an Σ 22 in isotropic position, we have that the L 2 norm of Σ 12 is at most 1. Using the triangle inequality, we can write: Σ 12 Σ Σ Σ + Σ 11 Σ 11 + Σ 22 Σ 22 ) where we applie the triangle inequality to the 2 2 block matrix with off-iagonal entries Σ 12 Σ 12 an with 0 iagonal entries). We now apply Lemma 7 three times, on Σ 11 Σ 11, Σ 22 Σ 22, an a scale version of Σ Σ. The first two applications follow irectly. For the thir application, we observe that Lemma 7 is rotation invariant, an that scaling each covariance value by some factor s scales the norm of the matrix by at most s. We claim that we can apply Lemma 7 on Σ Σ with s = 4. Since the covariance of any two ranom variables is at most the prouct of their stanar eviations, an since Σ 11 an Σ 22 are I 1 an I 2 respectively, the maximum singular value of Σ 12 is at most 1; so the maximum singular value of Σ is at most 4. Our claim follows. The lemma follows by plugging in n as a function of ɛ, an w min
6 Lemma 7 Let X be a set of n points generate by a mixture of k Gaussians over R, scale such that E[x x T ] = I. If M is the sample covariance matrix of X, then, for n large enough, with probability at least 1 δ, log n log 2n δ M E[M] C ) log1/δ) wmin n where C is a fixe constant, an w min is the minimum mixing weight of any Gaussian in the mixture. Proof: To prove this lemma, we use a concentration result on the L 2 -norms of matrices ue to Ruelson & Vershynin, 07). We observe that each vector x i in the scale space is generate by a Gaussian with some mean µ an maximum irectional variance σ 2. As the total variance of the mixture along any irection is at most 1, w min µ 2 + σ 2 ) 1. Therefore, for all samples x i, with probability at least 1 δ/2, x i µ + σ log 2n δ ). We conition on the fact that the event x i µ + σ log 2n δ ) happens for all i = 1,..., n. The probability of this event is at least 1 δ/2. Conitione on this event, the istributions of the vectors x i are inepenent. Therefore, we can apply Theorem 3.1 in Ruelson & Vershynin, 07) on these conitional istributions, to conclue that: Pr[ M E[M] > t] 2e cnt2 /Λ 2 log n where c is a constant, an Λ is an upper boun on the norm of any vector x i. The lemma follows by plugging in t =, an Λ 2 log2n/δ) wmin. Λ 2 log4/δ) log n cn Proof: Of Lemma 4) For the sake of contraiction, suppose there exists a vector v 1 S 1 such that the projection of v 1 on the top k 1 left singular vectors of Σ 12 is equal to 1 τ 2 v 1, where τ > τ. Then, there exists some unit vector u 1 in V 1 in the orthogonal complement of the space spanne by the top k 1 left singular vectors of Σ 12 such that the projection of v 1 on u 1 is equal to τ v 1. This vector u 1 can be written as: u 1 = τv 1 +1 τ 2 ) 1/2 y 1, where y 1 is in the orthogonal complement of S 1. From Lemma 2, there exists some vector u 2 in S 2, such that v 1 ) Σ 12 u 2 λ min ; from Lemma 3, for this vector u 2, u 1 ) Σ 12 u 2 τλ min. If n > c τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1 δ ), then, from Lemma 6, u 1 ) T Σ12 u 2 τ 2 λ min. Now, since u 1 is in the orthogonal complement of the subspace spanne by the top k 1 left singular vectors of Σ 12, for any vector y 2 in the subspace spanne by the top k 1 right singular vectors of Σ 12, u 1 ) Σ12 y 2 = 0. This means that there exists a vector z 2 V 2, the orthogonal complement of the subspace spanne by the top k 1 right singular vectors of Σ 12 such that u 1 ) T Σ12 z 2 τ 2 λ min. This implies that the k-th singular value of Σ 12 is at least τ 2 λ min. However, from Lemma 6, all but the top k 1 singular values of Σ 12 are at most τ 3 λ min, which is a contraiction. Proof:Of Theorem 1) From Lemma 4, if n > C τ 2 λ 2 log2 min wmin τλ minw min ) log 2 1 δ ), then, with probability at least 1 δ, the projection of any vector v in S 1 or S 2 onto the subspace returne by Step 2 of Algorithm 1 has length at least 1 τ 2 v. Therefore, the maximum irectional variance of any D i in this subspace is at most 1 τ 2 )σ ) 2 + τ 2 σ 2, where σ 2 is the maximum irectional variance of any D i. When τ σ σ, this is at most 2σ ) 2. From the 1 isotropic conition, σ wmin. Therefore, when C σ ) 2 λ 2 min w2 min n > log 2 σ λ minw min ) log 2 1 δ ), the maximum irectional variance of any D i in the mixture in the space output by Step 2 is at most 2σ ) 2. Since A an B are ranom partitions of the sample set S, the subspace prouce by the action of Step 2 of Algorithm 1 on the set A is inepenent of B, an vice versa. Therefore, when projecte onto the top k 1 SVD subspace of Σ 12 A), the samples from B are istribute as a mixture of k 1)-imensional Gaussians. The theorem follows from the previous paragraph, an Theorem 1 of Arora & Kannan, 05). 5. Experiments 5.1. Auio-visual speaker clustering In the first set of experiments, we consier clustering either auio or face images of speakers. We use 41 speakers from the ViTIMIT atabase Sanerson, 08), speaking 10 sentences about secons) each, recore at 25 frames per secon in a stuio environment with no significant lighting or pose variation. The auio features are stanar 12-imensional mel cepstra Davis & Mermelstein, 1980) an their erivatives an ouble erivatives compute every 10ms over a ms winow, an finally concatenate over a winow of 4ms centere on the current frame, for a total of 1584 imensions. The vieo features are pixels of the face region extracte from each image 2394 imensions). We consier the target cluster variable to be the speaker. We use either CCA or PCA to project the ata to a lower imensionality N. In the case of CCA, we initially project to an intermeiate imensionality M using PCA to reuce the effects of spurious correlations. For the results reporte here, typical values selecte using a hel-out set) are N = an
7 PCA CCA Images Auio Images + occlusion Auio + occlusion Images + translation Auio + translation Table 1. Conitional perplexities of the speaker given the cluster, using PCA or CCA bases. + occlusion an + translation inicate that the images are corrupte with occlusion/translation; the auio is unchange, however. M = 100 for images an 1000 for auio. For CCA, we ranomize the vectors of one view in each sentence, to reuce correlations between the views ue to other latent variables such as the current phoneme. We then cluster either view using k-means into 82 clusters 2 per speaker). To alleviate the problem of local minima foun by k-means, each clustering consists of 5 runs of k-means, an the one with the lowest score is taken as the final clustering. Similarly to Blaschko & Lampert, 08), we measure clustering performance using the conitional entropy of the speaker s given the cluster c, Hs c). We report the results in terms of conitional perplexity, 2 Hs c), which is the mean number of speakers corresponing to each cluster. Table 1 shows results on the raw ata, as well as with synthetic occlusions an translations of the image ata. Consiering the clean visual environment, we expect PCA to o very well on the image ata. Inee, PCA provies an almost perfect clustering of the raw images an CCA oes not improve it. However, CCA far outperforms PCA when clustering the more challenging auio view. When synthetic occlusions or translations are applie to the images, the performance of PCA-base clustering is greatly egrae. CCA is unaffecte in the case of occlusion; in the case of translation, CCA-base image clustering is egrae similarly to PCA, but auio clustering is almost unaffecte. In other wors, even when the image ata are egrae, CCA is able to recover a goo clustering in at least one of the views. 1 For a more etaile look at the clustering behavior, Figures 1a-) show the istributions of clusters for each speaker. 1 The auio task is unusually challenging, as each feature vector correspons to only a few phonemes. A typical speaker classification setting uses entire sentences. If we force the cluster ientity to be constant over each sentence the most frequent cluster label in the sentence), performance improves greatly; e.g., in the auio+occlusion case, the perplexity improves to 8.5 PCA) an 2.1 CCA) Clustering Wikipeia articles Next we consier the task of clustering Wikipeia articles, base on either their text or their incoming an outgoing links. The link structure L is represente as a concatenation of to an from link incience vectors, where each element Li) is the number of times the current article links to/from article i. The article text is represente as a bag-of-wors feature vector, i.e. the raw count of each wor in the article. A lexicon of about 8 million wors an a list of about 12 million articles were use to construct the two feature vectors. Since the imensionality of the feature vectors is very high over million for the link view), we use ranom projection to reuce the imensionality to a computationally manageable level. We present clustering experiments on a subset of Wikipeia consisting of 128,327 articles. We use either PCA or CCA to reuce the feature vectors to the final imensionality, followe by clustering. In these experiments, we use a hierarchical clustering proceure, as a flat clustering is poor with either PCA or CCA CCA still usually outperforms PCA, however). In the hierarchical proceure, all points are initially consiere to be in a single cluster. Next, we iteratively pick the largest cluster, reuce the imensionality using PCA or CCA on the points in this cluster, an use k-means to break the cluster into smaller sub-clusters for some fixe k), until we reach the total esire number of clusters. The intuition for this is that ifferent clusters may have ifferent natural subspaces. As before, we evaluate the clustering using the conitional perplexity of the article category a as given by Wikipeia) given the cluster c, 2 Ha c). For each article we use the first category liste in the article. The 128,327 articles inclue roughly 15,000 categories, of which we use the 500 most frequent ones, which cover 73,145 articles. While the clustering is performe on all 128,327 articles, the reporte entropies are for the 73,145 articles. Each sub-clustering consists of 10 runs of k-means, an the one with the lowest k-means score is taken as the final cluster assignment. Figure 1e) shows the conitional perplexity versus the number of clusters for PCA an CCA base hierarchical clustering. For any number of clusters, CCA prouces better clusterings, i.e. ones with lower perplexity. In aition, the tree structures of the PCA/CCAbase clusterings are qualitatively ifferent. With PCA base clustering, most points are assigne to a few large clusters, with the remaining clusters being very small. CCA-base hierarchical clustering prouces more balance clusters. To see this, in Figure 1f) we show the perplexity of the cluster istribu-
8 5 a) AV: Auio, PCA basis 5 c) AV: Images + occlusion, PCA basis e) Wikipeia: Category perplexity hierarchical CCA hierarchical PCA speaker speaker perplexity cluster cluster number of clusters 5 10 b) AV: Auio, CCA basis 5 10 ) AV: Images + occlusion, CCA basis f) Wikipeia: Cluster perplexity balance clustering hierarchical CCA hierarchical PCA speaker speaker Entropy cluster cluster number of clusters Figure 1. a-) Distributions of cluster assignments per speaker in auio-visual experiments. The color of each cell s, c) correspons to the empirical probability pc s) arker = higher). e-f) Wikipeia experiments: e) Conitional perplexity of article category given cluster 2 Ha c) ). f) Perplexity of the cluster istribution 2 Hc) ) tion versus number of clusters. For about 25 or more clusters, the CCA-base clustering has higher perplexity, inicating a more uniform istribution of clusters. References Achlioptas, D., & McSherry, F. 05). On spectral learning of mixtures of istributions. Conf. on Learning Thy pp ). Ano, R. K., & Zhang, T. 07). Two-view feature generation moel for semi-supervise learning. Int. Conf. on Machine Learning pp ). Arora, S., & Kannan, R. 05). Learning mixtures of separate nonspherical Gaussians. Ann. Applie Prob., 15, Blaschko, M. B., & Lampert, C. H. 08). Correlational spectral clustering. Conf. on Comp. Vision an Pattern Recognition. Blum, A., & Mitchell, T. 1998). Combining labele an unlabele ata with co-training. Conf. on Learning Thy. pp ). Brubaker, S. C., & Vempala, S. 08). Isotropic PCA an affine-invariant clustering. Foun. of Comp. Sci. pp ). Chauhuri, K., & Rao, S. 08). Learning mixtures of istributions using correlations an inepenence. Conf. On Learning Thy. pp. 9 ). Dasgupta, S. 1999). Learning mixtures of Gaussians. Foun. of Comp. Sci. pp ). Dasgupta, S., & Schulman, L. 00). A two-roun variant of EM for Gaussian mixtures. Uncertainty in Art. Int. pp ). Davis, S. B., & Mermelstein, P. 1980). Comparison of parametric representations for monosyllabic wor recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, an Signal Proc., 28, Dunn, G., & Everitt, B. 04). An introuction to math. taxonomy. Dover Books. Kakae, S. M., & Foster, D. P. 07). Multi-view regression via canonical correlation analysis. Conf. Learning Thy pp ). Kannan, R., Salmasian, H., & Vempala, S. 05). The spectral metho for general mixture moels. Conf. on Learning Thy pp ). Ruelson, M., & Vershynin, R. 07). Sampling from large matrices: An approach through geometric functional analysis. Jour. of ACM. Sanerson, C. 08). Biometric person recognition: Face, speech an fusion. VDM-Verlag. Vempala, V., & Wang, G. 02). A spectral algorithm for learning mixtures of istributions. Foun. of Comp. Sci. pp ).
Multi-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationPermanent vs. Determinant
Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationLecture 6 : Dimensionality Reduction
CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationHybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion
Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationDatabase-friendly Random Projections
Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationSpeaker Adaptation Based on Sparse and Low-rank Eigenphone Matrix Estimation
INTERSPEECH 2014 Speaker Aaptation Base on Sparse an Low-rank Eigenphone Matrix Estimation Wen-Lin Zhang 1, Dan Qu 1, Wei-Qiang Zhang 2, Bi-Cheng Li 1 1 Zhengzhou Information Science an Technology Institute,
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationTEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE
TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,
More informationarxiv: v1 [cs.lg] 22 Mar 2014
CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationOn combinatorial approaches to compressed sensing
On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu
More informationCUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu
CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationImproving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers
International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationLecture 5. Symmetric Shearer s Lemma
Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationA Unified Theorem on SDP Rank Reduction
A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in
More informationA simple model for the small-strain behaviour of soils
A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationarxiv: v1 [math.mg] 10 Apr 2018
ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationOn the Surprising Behavior of Distance Metrics in High Dimensional Space
On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com
More informationSharp Thresholds. Zachary Hamaker. March 15, 2010
Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationLeaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes
Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationELEC3114 Control Systems 1
ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationThe Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas
The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationImage Denoising Using Spatial Adaptive Thresholding
International Journal of Engineering Technology, Management an Applie Sciences Image Denoising Using Spatial Aaptive Thresholing Raneesh Mishra M. Tech Stuent, Department of Electronics & Communication,
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationOptimal CDMA Signatures: A Finite-Step Approach
Optimal CDMA Signatures: A Finite-Step Approach Joel A. Tropp Inst. for Comp. Engr. an Sci. (ICES) 1 University Station C000 Austin, TX 7871 jtropp@ices.utexas.eu Inerjit. S. Dhillon Dept. of Comp. Sci.
More informationTHE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP
THE GENUINE OMEGA-REGULAR UNITARY DUAL OF THE METAPLECTIC GROUP ALESSANDRA PANTANO, ANNEGRET PAUL, AND SUSANA A. SALAMANCA-RIBA Abstract. We classify all genuine unitary representations of the metaplectic
More informationLinear Regression with Limited Observation
Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear
More informationSYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is
SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. Uniqueness for solutions of ifferential equations. We consier the system of ifferential equations given by x = v( x), () t with a given initial conition
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationDesigning of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution
International Journal of Statistics an Systems ISSN 973-675 Volume, Number 3 (7), pp. 475-484 Research Inia Publications http://www.ripublication.com Designing of Acceptance Double Sampling Plan for Life
More informationHomework 2 EM, Mixture Models, PCA, Dualitys
Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The
More information1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.
Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationCollapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling
Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationA Weak First Digit Law for a Class of Sequences
International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of
More informationRamsey numbers of some bipartite graphs versus complete graphs
Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer
More informationShort Intro to Coordinate Transformation
Short Intro to Coorinate Transformation 1 A Vector A vector can basically be seen as an arrow in space pointing in a specific irection with a specific length. The following problem arises: How o we represent
More informationMath 1271 Solutions for Fall 2005 Final Exam
Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationarxiv: v2 [cs.ds] 11 May 2016
Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We
More informationLecture 6: Generalized multivariate analysis of variance
Lecture 6: Generalize multivariate analysis of variance Measuring association of the entire microbiome with other variables Distance matrices capture some aspects of the ata (e.g. microbiome composition,
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationApproximate Constraint Satisfaction Requires Large LP Relaxations
Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi
More informationHomework 2 Solutions EM, Mixture Models, PCA, Dualitys
Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationOn colour-blind distinguishing colour pallets in regular graphs
J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article
More informationMany problems in physics, engineering, and chemistry fall in a general class of equations of the form. d dx. d dx
Math 53 Notes on turm-liouville equations Many problems in physics, engineering, an chemistry fall in a general class of equations of the form w(x)p(x) u ] + (q(x) λ) u = w(x) on an interval a, b], plus
More informationd-dimensional Arrangement Revisited
-Dimensional Arrangement Revisite Daniel Rotter Jens Vygen Research Institute for Discrete Mathematics University of Bonn Revise version: April 5, 013 Abstract We revisit the -imensional arrangement problem
More information