Necessary and Sufficient Conditions for Sketched Subspace Clustering

Size: px
Start display at page:

Download "Necessary and Sufficient Conditions for Sketched Subspace Clustering"

Transcription

1 Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This paper is about an interesting phenomenon: two r-imensional subspaces, even if they are orthogonal to one an other, can appear ientical if they are only observe on a subset of coorinates. Unerstaning this phenomenon is of particular importance for many moern applications of subspace clustering where one woul like to subsample in orer to improve computational efficiency. Examples inclue real-time vieo surveillance an atasets so large that cannot even be store in memory. In this paper we introuce a new metric between subspaces, which we call partial coorinate iscrepancy. This metric captures a notion of similarity between subsample subspaces that is not capture by other istance measures between subspaces. With this, we are able to show that subspace clustering is theoretically possible in lieu of coherence assumptions using only r + rows of the ataset at han. This gives precise information-theoretic necessary an sufficient conitions for sketche subspace clustering. This can greatly improve computational efficiency without compromising performance. We complement our theoretical analysis with synthetic an real ata experiments. I. INTRODUCTION In subspace clustering (SC), one is given a ata matrix X whose columns lie in the union of several (unknown) r- imensional subspaces, an aims to infer these subspaces an cluster the columns in X accoringly []. The union of subspaces moel is a powerful an flexible moel that applies to a wie variety of practical applications, ranging from computer vision [2] to network inference [3], [4], compression [5], recommener systems an collaborative filtering [6], [7]. Hence there is growing attention to this problem. As a result, existing theory an methos can hanle outliers [8] [3], noisy measurements [4], privacy concerns [5], ata constraints [6], an missing ata [7] [2], among other ifficulties. Yet, in many relevant applications, such as real-time vieo surveillance, or cases where X is too large to even store in memory, SC remains infeasible ue to computational constraints. In applications like these, it is essential to hanle big atasets in a computationally efficient manner, both in terms of storage an processing time. Fortunately, stuies regaring missing ata show that uner this moel, very large atasets can be accurately represente using a very small number of its entries [7] [2]. With this in min, recent stuies (e.g., [22]) explore the iea of projecting the ata (e.g., subsampling or sketching) as alternatives to reuce computational costs (time an storage). On this matter, it was recently shown that if the subspaces are sufficiently incoherent an separate, an the columns are well-sprea over the subspaces, then the popular Fig. : Left: The columns in X (represente by points) lie in the union of two -imensional subspaces in R 3. We want to cluster these points using only a few coorinates (to improve computational costs). This can be one if we use coorinates (y, z), as in the center. The main ifficulty is that the subspaces may be equal in certain coorinates. In this example, the subspaces are equal on the (x, y) coorinates. So if we use coorinates (x, y), as in the right, then all columns will appear to lie in the same subspace, an clustering woul be impossible. We o not know beforehan the coorinates in which the subspaces are ifferent. Searching for such coorinates coul result in combinatorial complexity, efeating the purpose of subsampling. sparse subspace clustering (SSC) algorithm [23] will fin the correct clustering using certain sketches of the ata (e.g., gaussian projection, row subsampling, an the fast Johnson- Linenstrauss transform) [24]. However, in general, these conitions are unverifiable. In this paper we show that almost every X can be theoretically clustere using as few as r + rows (the minimum require) of a generic rotation of X. The subtlety of this result is that the unerlying subspaces may be equal in certain coorinates. This means that if we sample a column of X in a set of coorinates where the unerlying subspaces are equal, one woul be unable to etermine (base on these observations) to which subspace it truly belongs. See Figure to buil some intuition. To give a concrete example, consier images as in Figure 2. It has been shown that the face images of the same iniviual uner varying illumination lie near a low-imensional subspace [25]. Hence SC can be use to classify faces. However, some coorinates (e.g., the corner pixels) are equal across many iniviuals. If we only sample those coorinates, we woul be unable to cluster. Moreover, those coorinates woul only obstruct clustering while consuming computational resources. To the best of our knowlege, none of the existing istance measures between subspaces captures this notion of partial coorinate similarity. For instance, Example in Section II shows that orthogonal subspaces (maximally apart with

2 Fig. 2: Images from the Extene Yale B ataset [26]. Each row has images of the same iniviual uner varying illumination. The vectorize images of each iniviual lie near a 9-imensional subspace [25], so the whole ataset lies near a union of subspaces. Some coorinates (e.g., the corner pixels) are equal across many iniviuals. If we only sample those coorinates, we woul be unable to subspace cluster. respect to the principal angle istance, the affinity istance, an the subspace incoherence istance [0]) can be ientical in certain coorinates. In this paper we stuy this phenomenon to erive precise information-theoretic necessary an sufficient conitions for sketche subspace clustering. To this en we first introuce a new istance measure between subspaces that captures this relationship between subspaces, which we call partial coorinate iscrepancy. This allows us to show that if we generically rotate X, its columns will lie in subspaces that are ifferent on all subsets of more than r coorinates with probability. In other wors, generic rotations maximize partial coorinate iscrepancy. This will imply that X can be clustere using only a sketch, that is, a few rows of a generic rotation of X. We complement our theoretical analysis with experiments using synthetic an real ata, showing the performance an avantages of sketching. Organization of the paper In Section II we formally state the problem, introuce our new istance measure between subspaces, an give our main results. In Section III we make several remarks about our istance measure. In Section IV we present experiments to support our results. We leave all proofs to Section V. II. MODEL AND MAIN RESULTS Let U = {S k } K k= be a set of r-imensional subspaces of R, an X be a n ata matrix whose columns lie in the union of the subspaces in U. Let X k enote the matrix with all the columns of X corresponing to S k. Assume: A The columns of X k are rawn inepenently accoring to an absolutely continuous istribution with respect to the Lebesgue measure on S k. A2 X k has at least r + columns. Fig. 3: Typical SC assumptions require (i) that the subspaces are sufficiently separate; this woul iscar subspaces that are too close, as in the top-left, (ii) that the subspaces are sufficiently incoherent; this woul iscar subspaces that are too aligne with the canonical axes, as in the top-left, an (iii) that the columns of X k are well-sprea over S k, as in the top-right; this woul iscar cases where the istribution of columns over S k is skewe, as in the bottom (left an right) [0]. In contrast, assumption A allows any collection of subspaces, incluing nearby an coherent subspaces, as in the top-left. A only requires that the columns of X k are rawn generically, as in the top-right an bottom-left. A exclues ill-conitione samples with Lebesgue measure zero, as in the bottom-right, where all columns lie in a line (when S k is a plane). A essentially requires that the columns in X k are rawn generically from S k. This allows nearby an coherent subspaces, an skewe istributions of the columns. In contrast, typical SC assumptions require that the subspaces are sufficiently separate, that S k is incoherent (not too aligne with the canonical axes), an that the columns are wellsprea over S k. See Figure 3 to buil some intuition. A2 is a funamental requirement for subspace clustering, as K sets of r columns can be clustere into K arbitrary r-imensional subspaces. Recall that we want to cluster X using only a few of its rows. The restriction of an r-imensional subspace in general position to l r coorinates is simply R l. So if X is sample on r or fewer rows, any subspace in general position woul agree with all the subsample columns, making clustering impossible. It follows that X must be sample on at least l = r + rows in orer to be clustere. In other wors, l = r+ rows are necessary for sketche subspace clustering. We will now show that X can be clustere using only this bare minimum of rows, i.e., that l = r + is also theoretically sufficient. To this en, we first introuce our new notion of istance between subspaces, which we call partial coorinate iscrepancy. Let [] l enote the collection of all subsets of {,..., } with exactly l istinct elements. Let Gr(r, R ) enote the Grassmann manifol of r-imensional subspaces in R, an let { } enote the inicator function. For any subspace,

3 matrix or vector that is compatible with a set ω [] l, we will use the subscript ω to enote its restriction to the coorinates/rows in ω. For example, X ω R l n enotes the restriction of X to the rows in ω, an S k ω R l enotes the restriction of S k to the coorinates in ω. Definition. Given S, S Gr(r, R ), efine the partial coorinate iscrepancy between S an S as: δ(s, S ) = ( r+ ) ω [] r+ {Sω S ω }. Example. Consier the following -imensional subspaces: S = span S = span. Then δ(s, S ) = 4, because if ω = {, 2} or ω = {3, 4}, 6 then S ω = S ω = span[ ] T, but for any of the other 4 choices of ω, S ω S ω. In other wors, S an S woul appear to be the same if they were only observe on the first two or the last two coorinates/rows. Notice that S an S are orthogonal (maximally apart with respect to the principal angle istance, the affinity istance, an the subspace incoherence istance [0]), yet they are ientical when restricte to certain coorinates. Remark. Notice that δ takes values in [0, ]. One can interpret δ as the probability that two subspaces are ifferent on r + coorinates chosen at ranom. For instance, if two subspaces are rawn inepenently accoring to the uniform measure over Gr(r, R ), then with probability they will have δ =. Example shows that even orthogonal subspaces can appear ientical if they are only sample on a subset of coorinates. Existing measures of istance between subspaces fail to capture this notion of partial coorinate similarity. In contrast, δ is a istance measure (metric) that quantifies the partial coorinate similarity of two subspaces when restricte to subsets of coorinates. We formalize this in the next lemma. The proof is given in Section V. Lemma. Partial coorinate iscrepancy is a metric over Gr(r, R ). Lemma implies that two ifferent subspaces must be ifferent on at least one set ω with r + coorinates. If subspaces S, S U are ifferent on ω, then columns corresponing to S an S can be subspace clustere using only X ω by iteratively trying combinations of r + columns in X ω. This is because uner A, a set of r + columns in X ω will be linearly epenent if an only if they correspon to the same subspace in U. This implies that we can cluster X using only r+ rows. The challenge is to etermine which rows to use. If the subspaces in U have δ = (i.e., they are ifferent on all subsets of r + coorinates), then we can cluster X using any set of r + rows. But if δ is small, we woul nee to use the right rows, which coul be har to fin. This matches the intuition that subspaces that are very similar are harer to cluster. Fortunately, we will show that generic rotations yiel maximal partial coorinate iscrepancy. In other wors, we will see that if we generically rotate the subspaces in U, then the rotate subspaces will be ifferent on all subsets of r + coorinates. This will imply that we can cluster X using any r + rows of a generic rotation of X. To formalize these ieas, let Γ R R enote a rotation operator. Assume A3 The rotation angles of Γ are rawn inepenently accoring to an absolutely continuous istribution with respect to the Lebesgue measure on (0, 2π). Essentially, A3 requires that Γ is a generic rotation. Equivalently, Γ can be consiere as a generic orthonormal matrix. Rotating X equates to left multiplying it by Γ. Similarly, the rotation of a subspace S by Γ (which we will enote by ΓS) is given by span{γu}, where U is a basis of S. The next lemma states that rotating subspaces by a generic rotation yiels subspaces with maximal partial coorinate iscrepancy. The proof is given in Section V. Lemma 2. Let Γ enote a rotation operator rawn accoring to A3. Let S, S be ifferent subspaces in Gr(r, R ). Then δ(γs, ΓS ) = with probability. Lemma 2 states that regarless of δ(s, S ), we can rotate S an S to obtain new subspaces with maximal partial coorinate iscrepancy (i.e., subspaces that are ifferent on all subsets of r + coorinates). See Figure 4 for some insight. Intuitively, a generic rotation istributes the local ifferences of S an S across all coorinates. So as long as S S, then (ΓS) ω will iffer (at least by a little bit) from (ΓS ) ω for every ω [] l, with l > r. This implies that ΓX can be perfectly clustere using any subset of l > r rows of ΓX (an clustering ΓX is as goo as clustering X). This is summarize in our main result, state in the next theorem. The proof is given in Section V. Theorem. Let A-A3 hol, an let ω [] l, with l > r. Let X be a subset of the columns in X. Transform an row-subsample X to obtain (ΓX ) ω. Then with probability, the columns in X lie in an r-imensional subspace of R if an only if the columns in (ΓX ) ω lie in an r-imensional subspace of R l. Theorem states that theoretically, X can be clustere using any r + rows of a generic rotation X = ΓX. Uner A-A3, perfectly clustering X ω is theoretically possible with probability by iteratively trying combinations of r +

4 Fig. 4: Left: Two ifferent subspaces (even orthogonal) can appear ientical if they are only observe on a subset of coorinates. In this figure, S an S are ientical if they are only observe on the (x, y) coorinates (top view). Right: Lemma 2 shows that if we rotate S an S generically, the rotate subspaces ΓS an ΓS will be ifferent on all subsets of more than r coorinates. In this figure, the rotate subspaces ΓS an ΓS are ifferent in all sets of r + = 2 coorinates, incluing the (x, y) plane. columns in X ω an verifying whether they are rank-r. This is because uner A an A3, a set of r + columns in X ω will be linearly epenent if an only if they correspon to the same subspace. Nonetheless, this combinatorial SC algorithm can be computationally prohibitive, especially for large n. In practice, we can use an algorithm such as sparse subspace clustering (SSC) [23]. This algorithm enjoys stateof-the-art performance, works well in practice, an has theoretical guarantees. The main iea behin SSC is that a column x in X lying in subspace S can be written as a linear combination of a few other columns in S (in fact, r or fewer). In contrast, it woul require more columns from other subspaces to express x as their linear combination (as many as ). So SSC aims to fin a sparse vector c R n, such that x = (X/x)c. Here X/x enotes the (n ) matrix forme with all the columns in X except x. The nonzero entries in c inex columns from the same subspace as x. SSC aims to fin such vector c by solving arg min c s.t. x = (X/x)c, () c R n where enotes the -norm, given by the sum of absolute values. SSC then uses spectral clustering on these coefficients to recover the clusters. Unfortunately, the solution to () is not exact. In fact, a typical solution to () will have most entries close to zero, an only a few (yet more than r) relevant entries. If we only use l = r + rows, the location of the relevant entries in c will be somewhat meaningless in the sense that they coul correspon to columns from ifferent subspaces, as it takes at most r + linearly inepenent columns to represent a column in R r+. As the number of rows l grows, the relevant entries in c are more likely to correspon to columns from the same subspace as x. On the other han, as l grows, so oes the computational complexity of (). Without subsampling the rows, the computational complexity of SSC is O(n 3 ). In contrast, using l > r rows, the computational complexity of SSC will only be O(ln 3 ). Depening on, n an r, this can bring substantial computational improvements. We thus want l to be large enough such that the relevant entries in c reveal clusters of X, but not so large that () is too computationally expensive. In fact, we know from Wang et al. [24] that SSC will fin the correct clustering using only l = O(r log(rk 2 ) + log n) rows if the following conitions hol (see Figure 3 to buil some intuition): (i) The angles between subspaces are sufficiently large. (ii) The subspaces are sufficiently incoherent with the canonical basis, or the ata is transforme by a gaussian projection or by the fast Johnson-Linenstrauss transform [27]. (iii) The columns of X k are well-sprea over S k. On the other han, Theorem states that theoretically it is possible to cluster X using only l = r+ rows, in lieu of these conitions. This reveals a gap between theory an practice that we further stuy in our experiments. We have shown that theoretically, conitions (i)-(iii) are sufficient but not necessary. It remains an open question whether there exists a polynomial time algorithm that can provably cluster without these requirements. III. ABOUT δ AND OTHER DISTANCES In this section we make several remarks about partial coorinate iscrepancy an its relation to other istances between subspaces. First recall the efinition of principal angle istance between two subspaces [28]. Definition 2 (Principal angle istance). Let S, S be subspaces in Gr(r, R ) with orthonormal bases U, U. The principal angle istance between S an S is efine as θ(s, S ) = U T U 2, where U is an orthonormal basis of S. It is intuitive that when ata are generate from subspaces that are close to one another, it is ifficult to cluster these ata correctly. Typically, other results use the principal angle istance to measure how close subspaces are. For example, in the previous section we iscusse that if conitions (i)- (iii) hol, then O(r log(rk 2 ) + log n) rows are sufficient for clustering [4]. Conition (i) essentially requires that θ is sufficiently large. The partial coorinate iscrepancy δ is an other useful metric. Here we use it to show that theoretically, r + rows are necessary an sufficient for clustering in lieu of these assumptions. We now wish to compare δ an θ. We will see that subspaces close in one metric can in general be far in the other. We believe this is an important observation for briging the gap between the sufficient oversampling of the rows require when using θ an the necessary an sufficient conition of Theorem. In our stuy, we will analyze δ using bases of subspaces, so let us first show that δ shares the important property of being basis inepenent. To see this, let U, U R r enote bases of S, S. Notice that S ω = S ω if an only if there exists a matrix B R r r such that U ω = U ω B. Now suppose that instea of U, we choose an other basis V of S. Since U

5 an V are both bases of S, there must exists a full-rank matrix Θ R r r such that U = VΘ. As before, S ω = S ω if an only if there exists a matrix B R r r such that U ω = V ω B. Now observe that if B such that U ω = U ω B, then B (namely B = ΘB) such that U ω = VB. Similarly, if B such that U ω = V ω B, then B (namely B = Θ B ) such that U ω = U ω B. With this, we can now stuy the relationship between partial coorinate iscrepancy an principal angle istance. The next example shows that two subspaces may be close with respect to θ, but far with respect to δ. Example 2 (Small θ may coincie with large δ). Consier a subspace S spanne by U R r. Let ɛ > 0 be given, an let U = U + ɛ. It is easy to see that θ(s, S ) 0 as ɛ 0. In contrast, δ(u, U ) = for every ɛ. Conversely, the next example shows that two subspaces may be close with respect to δ, but far with respect to θ. Example 3 (Small δ may coincie with large θ). Consier two subspaces S, S Gr(r, R ) spanne by I I U = I an U = I. 0 0 where I enotes the ientity matrix. For r, δ(s, S ) will be close to zero, because the two subspaces iffer on only r + subsets of the first 2r coorinates. However, the subspaces are orthogonal an so the principal angle istance is maximal; θ(s, S ) =. Examples 2 an 3 show that in general, subspaces close in one metric can be far in the other. However, for subspaces that are incoherent with the canonical axes, there is an interesting relation between δ an θ. Recall that coherence is a parameter inicating how aligne a subspace is with the canonical axes [29]. More precisely, Definition 3 (Coherence). Let S Gr(r, R ). Let P S enote the projection operator onto S, an e i the i th canonical vector in R. The stanar coherence parameter µ [, r ] of S is efine as µ = r max i P Se i 2 2. Intuitively, an incoherent subspace (small µ) will be wellsprea over all the canonical irections. Equivalently, the magnitue of the rows of its bases will not vary too much. In this case, if δ is small, we can also expect θ to be small. The following example emonstrates one such scenario. Example 4 (An example where small δ, µ imply small θ). Suppose that S an S are spanne by orthogonal bases U, U respectively. Suppose they have coorinates on which they span the same subspace; for close to, this will result in a small δ. Suppose the coherence for each subspace is boune by µ 0, i.e., r max i P Se i 2 2 = r max i U i 2 2 µ 0 where U i is the i th row of U. Further suppose that if we subsample the basis only on the coorinates the two subspaces have in common, we can lower boun their inner prouct: U T i U i c 0. i= This is essentially another incoherence conition that will hol with c 0 when the subspaces are highly incoherent with the canonical basis. Then θ(s, S 2 ) (c 0 ( )µ 0r ) µ0r when c 0 ( ) > 0. From this example our intuition is confirme: if is very close to, c 0, an µ 0 is constant, the term in the parentheses is near an the angle is small. To see how we get the boun on θ(s, S ), first note that θ(s, S ) = U T U 2 2, an we can boun the secon term from below. U T U 2 = U T i U i i= c 0 c 0 2 i=+ i=+ 2 = U T i U i + i= U T i U i 2 i=+ U T i U i 2 (2) U i 2 U i c 0 2 ( )µ 0r where we use the triangle inequality, matrix norm inequality, an step (2) follows by assumption. This illustrates a case where, if the subspaces in U have low coherence an their partial coorinate iscrepancy is small, the angle between them will also be small. Existing analyses show that practical SC algorithms ten to fail if θ is small [23]. It follows that for incoherent subspaces, if δ is small, SC can be very har in practice. This is illustrate in Figure 5, which shows that the clustering performance of practical algorithms eclines as δ ecreases. IV. EXPERIMENTS Theorem shows that one can cluster X using only r + rows of ΓX. As iscusse in Section II, practical algorithms like SSC may require more than these bare minimum number of rows. In this section we present experiments to stuy the gap between what is theoretically possible an what is practically possible with state-of-the-art algorithms. In Section III we also explaine that for incoherent subspaces, the partial coorinate iscrepancy δ an the principal angle istance θ have a tight relation: if δ is small, then θ is small too. Existing analyses show that practical SC algorithms ten to fail if θ is small [23]. It follows that for incoherent subspaces, if δ is small, SC can be very har in practice. The experiments of this section support these results. In our experiments, we will compare the following approaches to subspace clustering: (a) Cluster X irectly (full-ata). (b) Cluster l > r rows of ΓX.

6 To compare things vis-à-vis, we will stuy the cases above using the sparse subspace clustering (SSC) algorithm [23]. We chose SSC because it enjoys state-of-the-art performance, works well in practice, an has theoretical guarantees. In all our experiments we use the SSC coe provie by their authors [23]. A. Simulations We will first use simulations to stuy the cases above as a function of the ambient imension, the partial coorinate iscrepancy δ of the subspaces in U, an the number of rows use l. To obtain subspaces with a specific δ, we first generate a r matrix V with entries rawn i.i.. from the stanar Gaussian istribution. Subspaces generate this way have low coherence. Then, for k =,..., K, we selecte the k th set of δ rows in V (i.e., rows (k )δ +,..., kδ ) an replace them with other entries, also rawn i.i.. from the stanar Gaussian istribution. This yiels K bases, which will span the subspaces in U. This way, the bases of any S an S in U will iffer on exactly 2δ rows. It follows that δ(s, S ) is equal to the probability of selecting any of these 2δ rows in r + raws (without replacement). That is, δ(s, S ) = ( 2δ r+ ) ( r+ ) for every S, S U. (3) Unfortunately, (3) gives little intuition of how small or large δ is. We will thus upper boun δ by a small number that is easily interpretable. To o this, we will use the next simple boun, which gives a clear iea of how small δ is in our experiments. A erivation is given in Section V. δ(s, S ) (r + )(2δ r) r = O ( rδ ). (4) In each trial of our experiments, we generate a set U of K = 5 subspaces, each of imension r = 5, using the proceure escribe above. Next we generate a matrix X with n k = 00 columns from each subspace. The coefficients of each column in X are rawn i.i.. from the stanar Gaussian istribution. Matrices generate this way satisfy A an A2. To measure accuracy, we fin the best matching between the ientifie clusters an the original sets. In our first simulation we stuy the epenency on δ (which gives a proxy of δ through (4)) an l, with = 0 5 fixe. The results are summarize in Figure 5 (top-left). This figure shows the gap between theory an practice. Theorem shows that theoretically, all these trials can be perfectly clustere. This figure shows, as preicte in Section III, that for incoherent subspaces, clustering becomes harer in practice as δ (an hence δ) shrinks. Observe that as δ grows, fewer rows suffice for accurate clustering. For example, in this experiment, SSC consistently succees with l = δ. Next we stuy the cases above as a function of an δ, with l = δ. The results are summarize in Figure 5 (top-right). This also shows a gap between theory an practice. Figure 5 shows, as preicte in Section III, that for incoherent subspaces, if δ (an hence δ) is too small, the angle between the subspaces in U will be small, whence Fig. 5: Proportion of correctly classifie points by SSC, using only l > r rows of ΓX, with K = 5 subspaces, each of imension r = 5, an n k = 00 columns per subspace. The color of each pixel inicates the average over 00 trials (the lighter the better). White represents 00% accuracy, an black represents 20%, which amounts to ranom guessing. Theorem states that theoretically, all these trials can be perfectly clustere. This shows a gap between theory an practice. Top-Left: Transition iagram as a function of δ (which gives a proxy of the partial coorinate iscrepancy δ through (4)), an the number of use rows l, with fixe ambient imension = 0 5. As iscusse in Section III, for incoherent subspaces, clustering becomes harer in practice as δ shrinks. Observe that as δ grows, fewer rows suffice for accurate clustering. Top-Right: Transition iagram as a function of an δ, using only l = δ rows. All pixels above the black point in each column have at least 95% accuracy. These points represent the minimum δ an l require for a clustering accuracy of at least 95%. As iscusse in Section III, for incoherent subspaces, if δ (an hence δ) is too small, the angle between the subspaces in U will be too small, whence clustering can be har in practice. Bottom-Left: Partial coorinate iscrepancy δ (upper boune by O(rδ /)) an fraction of rows l/ require by 3SC for a clustering accuracy of at least 95%. The curve is the best exponential fit to these points. This curve represents the iscriminant between 95% accuracy (above curve) an less than 95% accuracy (below curve). This shows that for incoherent subspaces, as grows, one only requires a vanishing partial coorinate iscrepancy δ an a vanishing fraction of rows l/ to succee. Bottom-Right: Time require to cluster X irectly (full-ata), an to cluster l = 20 rows of ΓX as a function of the ambient imension (average over 00 trials). In all of these trials, both options achieve 00% accuracy. clustering can be har in practice. In this experiment, we also recor the minimum δ an l require for a clustering accuracy of at least 95%. Figure 5 (bottom-left) shows that for incoherent subspaces, as grows, one only requires a vanishing partial coorinate iscrepancy δ an a vanishing fraction of rows l/ to succee. In our last simulation we stuy the computation time require require to cluster X irectly (full-ata), an to cluster l = 20 rows of ΓX as a function of. In this experiment, we fix l = δ = 20, known from our previous experiment to prouce 00% accuracy for a wie range of. Unsurprisingly, Figure 5 (bottom-right) shows that if

7 we only use a constant number of rows, the computation time is virtually unaffecte by the ambient imension, unlike stanar (full-ata) algorithms. This can thus bring computational complexity orers of magnitue lower (epening on an n) than stanar (full-ata) techniques. B. Real Data We now evaluate the performance of sketching on a real life problem where the phenomenon of partial coorinate similarity arises naturally: classifying faces. To this en we use the Extene Yale B ataset [26], which consists of face images of 38 iniviuals with a fixe pose uner varying illumination (see Figure 6). As iscusse in [23], shaows an specularities in these images can be moele as sparse errors. So as a preprocessing step, we first apply the augmente Lagrange multiplier metho [30] for robust principal component analysis on the images of each iniviual (using coe provie by the authors). This will remove the sparse errors, so that the vectorize images of each iniviual lie near a 9-imensional subspace [25]. Hence, the matrix X containing all the vectorize images images lies near a union of 38, 9-imensional subspaces. Observe that these images are very similar on several regions. For example, the lower corners are mostly ark. Distinct subspaces can thus appear to be the same if they are only observe on the coorinates corresponing to these pixels. If we only use a few rows of X (without rotating), there is a positive probability of selecting these coorinates. In this case, we woul be unable to etermine the right clustering. Fortunately, Lemma 2 shows that the columns of a generic rotation of X will lie near a union of subspaces that will be ifferent on all subsets of l > r coorinates (maximal partial coorinate iscrepancy). This implies, as shown in Theorem, that the clusters of the original X will be the same as the clusters of any l > r rows of the rotate X. This means that we can cluster X using any l > r coorinates of a rotation of X. This is verifie by the following experiment. In this experiment we stuy classification accuracy as a function of the number of iniviuals, or equivalently the number of subspaces K, an as a function of the number of rows l use for clustering. We o this replicating the experiment in [23]: we first ivie all iniviuals into four groups, corresponing to iniviuals {,..., 0}, {,..., 20}, {2,..., 30} an {3,..., 38}. Next we cluster all possible choices of K {2, 3, 5, 8, 0} iniviuals for the first three groups, an K {2, 3, 5, 8} iniviuals for the last group. We repeat this experiment for ifferent choices of l, an recor the classification accuracy. The results are summarize in Figure 6. They show that one can achieves the same performance as stanar (full-ata) methos, using only a small fraction of the ata. This results in computational avantages (time an memory). V. PROOFS In this section we give the proofs of all our statements. Fig. 6: Left: Proportion of correctly classifie images from the Extene Yale B ataset [26] (see Figure 2), as a function of the number of iniviuals, or equivalently the number of subspaces K, an as a function of the number of rows l use for clustering. In particular, l = = 206 correspons to stanar (full-ata) SSC. Right: Computation time as a function of the number of iniviuals K, with l = 65 fixe (known from the center figure to achieve the same accuracy as stanar SSC). Recall that the computational complexities of SSC an sketching are O(n 3 ) an O(ln 3 ), respectively. Here = 206 an n = 38K. This shows that sketching achieves the same accuracy as stanar SSC in only a fraction of the time. This gap becomes more evient as an n grow, as shown in Figure 5. Proof of Lemma We nee to show that δ satisfies the three properties of a metric. Let S, S, S Gr(r, R ). (i) It is easy to see that if S = S, then δ(s, S ) = 0. To obtain the converse, suppose δ(s, S ) = 0. Let υ = {,..., r}, an let ω i = υ i, with i = r +,...,. Take bases U, U of S, S, such that U ω = U ω. We can o this because δ(s, S ) = 0, which implies S ω = S ω for every ω [] r+, incluing ω. Next observe that for i = r + 2,...,, since S ωi = S ω i an U υ = U υ, it must be that U = U on the i th row (otherwise S ωi S ω i ). We thus conclue that U = U, which implies S = S. (ii) That δ(s, S ) = δ(s, S ) follows immeiately from the efinition. (iii) To see that δ satisfies the triangle inequality, write: δ(s, S ) + δ(s, S ) = ( ) ( {Sω S r+ ω [] r+ ω } + {S ω S ω } ) ( ) {Sω S r+ ω [] r+ ω S ω S ω } ( ) {Sω S r+ ω [] r+ ω } = δ(s, S ), where the last inequality follows because {S = S S = S } implies {S = S }, whence {Sω S ω S ω S ω } = {Sω S ω } = 0, an in any other case, {Sω S ω S ω S ω } = {Sω S ω }. Proof of Lemma 2 We nee to show that if S S, then (ΓS) ω (ΓS ) ω for every ω [] r+. Let U an U enote bases of S an S. Observe that (ΓS) ω = (ΓS ) ω if an only if there exists a matrix B R r r such that (ΓU ) ω = (ΓU) ω B, or equivalently, if an only if Γ ω U = Γ ω UB, which we can rewrite as Γ ω (U UB) = 0. (5)

8 Let υ enote the subset with the first r elements in ω, an i enote the last element in ω. Then we can rewrite (5) as [ Γ υ Γ i ] (U UB) = 0. (6) Since Γ is rawn accoring to A3, the rows in Γ υ are linearly inepenent with probability. Since U is a basis of an r- imensional subspace, its r columns are also linearly inepenent. It follows that Γ υ U is a full-rank r r matrix. So we can use the top block in (6) to obtain B = (Γ υ U) Γ υ U. We can plug this in the bottom part of (6) to obtain Γ i (U U(Γ υ U) Γ υ U ) = 0. (7) Recall that (Γ υ U) = (Γ υ U) / Γ υ U, where (Γ υ U) an Γ υ U enote the ajugate an the eterminant of Γ υ U. Therefore, we may rewrite (7) as the following system of r polynomial equations: Γ i ( Γ υ U U U(Γ υ U) Γ υ U ) = 0. (8) Observe that the left-han sie of (8) is just an other way to write Γ i (U UB), where B is in terms of U, U an Γ υ. Since S S, there exists no B R r r such that U = UB. Equivalently, (U UB) 0. Since Γ is rawn accoring to A3, we conclue that the left han sie of (8) is a nonzero set of polynomials, an so (8) hols with probability zero. Since (ΓS) ω = (ΓS ) ω if an only if (8) hols, we conclue that with probability, (ΓS) ω (ΓS ) ω. Since ω was arbitrary, we conclue that this is true for every ω [] r+, as esire. Proof of Theorem Recall that X k enotes the matrix forme with all the columns in X corresponing to the k th subspace in U. Uner A-A2, with probability the partition {X k } K k= is the only way to cluster the columns in X into K r-imensional subspaces. This is because uner A, the columns in X will lie on intersections of the subspaces in U with probability zero. So any combination of more than r columns from ifferent subspaces in U will lie in a subspace of imension greater than r with probability. Recall that [] l enotes the set of all subsets of {,..., } with exactly l istinct elements, an that Γ enotes a generic rotation rawn accoring to A3. Let ω [] l, an efine (ΓU) ω as the set of rotate subspaces in U, restricte to the coorinates in ω, i.e., (ΓU) ω = {(ΓS k ) ω } K k=. Lemma 2 implies that all the subspaces in (ΓU) ω are ifferent. It is easy to see that the columns in (ΓX k ) ω lie in (ΓS k ) ω. By A an A3, the columns in (ΓX) ω will lie on intersections of the subspaces in (ΓU) ω with probability zero. So any combination of more than r columns from ifferent subspaces in (ΓU) ω will lie in a subspace of imension greater than r with probability. Derivation of (4) We want to show that δ(s, S ) (r + )(2δ r). r Recall that δ(s, S ) is the probability that S an S are ifferent on a set of r + coorinates selecte uniformly at ranom (without replacement). In the setup of Section IV, the bases U, U of S, S are ifferent on exactly 2δ rows. Then δ(s, S ) = P( out of 2δ rows in r + raws) = P ( r+ { out of 2δ rows in τ th raw}) (a) (b) = (c) = τ= r+ τ= r+ τ= r+ τ= P( out of 2δ rows in τ th raw) τ ρ=0 P( out of 2δ rows in τ th raw ρ out of 2δ rows in first τ raws) P(ρ out of 2δ rows in first τ raws) τ ρ=0 2δ r r P(ρ out of 2δ rows in first τ raws), where (a) follows by the union boun, (b) follows by the law of total probability, an (c) follows because the probability of selecting one of the 2δ istinct rows in the τ th raw (without replacement) is smallest if τ = r + an ρ = r, which correspons to the case where the ratio (after r raws) of istinct rows (2δ r) versus equal rows ( r) is smallest. Continuing with the last equation, we have: δ(s, S ) 2δ r r τ ρ=0 r+ τ= P(ρ out of 2δ rows in first τ raws) = (r + )(2δ r), r as esire. VI. ACKNOWLEDGEMENTS Work by L. Balzano was supporte by ARO Grant W9NF REFERENCES [] R. Vial, Subspace clustering, IEEE Signal Processing Magazine, 20. [2] K. Kanatani, Motion segmentation by subspace separation an moel selection, IEEE International Conference in Computer Vision, 200. [3] B. Eriksson, P. Barfor, J. Sommers an R. Nowak, DomainImpute: Inferring unseen components in the Internet, IEEE INFOCOM Mini- Conference, 20. [4] G. Mateos an K. Rajawat, Dynamic network cartography: Avances in network health monitoring, IEEE Signal Processing Magazine, 203. [5] W. Hong an J. Wright an K. Huang an Y. Ma, Multi-scale hybri linear moels for lossy image representation, IEEE Transactions on Image Processing, [6] J. Rennie an N. Srebro, Fast maximum margin matrix factorization for collaborative preiction, International Conference on Machine Learning, [7] A. Zhang, N. Fawaz, S. Ioanniis an A. Montanari, Guess who rate this movie: Ientifying users through subspace clustering, Conference on Uncertainty in Artificial Intelligence, 202.

9 [8] G. Liu, Z. Lin an Y. Yu, Robust subspace segmentation by low-rank representation, International Conference on Machine Learning, 200. [9] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu an Y. Ma, Robust recovery of subspace structures by low-rank representation, IEEE Transactions on Pattern Analysis an Machine Intelligence, 203. [0] M. Soltanolkotabi an E. Canès, A geometric analysis of subspace clustering with outliers, Annals of Statistics, 202. [] M. Soltanolkotabi, E. Elhamifar an E. Canès, Robust subspace clustering, Annals of Statistics, 204. [2] C. Qu an H. Xu, Subspace clustering with irrelevant features via robust Dantzig selector, Avances in Neural Information Processing Systems, 205. [3] X. Peng, Z. Yi an H. Tang, Robust subspace clustering via thresholing rige regression, AAAI Conference on Artificial Intelligence, 205. [4] Y. Wang an H. Xu, Noisy sparse subspace clustering, International Conference on Machine Learning, 203. [5] Y. Wang, Y.-X. Wang an A. Singh, Differentially private subspace clustering, Avances in Neural Information Processing Systems, 205. [6] H. Hu, J. Feng an J. Zhou, Exploiting unsupervise an supervise constraints for subspace clustering, IEEE Pattern Analysis an Machine Intelligence, 205. [7] L. Balzano, B. Recht an R. Nowak, High-imensional matche subspace etection when ata are missing, IEEE International Symposium on Information Theory, 200. [8] B. Eriksson, L. Balzano an R. Nowak, High-rank matrix completion an subspace clustering with missing ata, Artificial Intelligence an Statistics, 202. [9] D. Pimentel-Alarcón, L. Balzano an R. Nowak, On the sample complexity of subspace clustering with missing ata, IEEE Statistical Signal Processing, 204. [20] D. Pimentel-Alarcón an R. Nowak, The information-theoretic requirements of subspace clustering with missing ata, International Conference on Machine Learning, 206. [2] C. Yang, D. Robinson an R. Vial, Sparse subspace clustering with missing entries, International Conference on Machine Learning, 205. [22] J. He, L. Balzano an A. Szlam, Incremental graient on the Grassmannian for online foregroun an backgroun separation in subsample vieo, Conference on Computer Vision an Pattern Recognition, 202. [23] E. Elhamifar an R. Vial, Sparse subspace clustering: algorithm, theory, an applications, IEEE Transactions on Pattern Analysis an Machine Intelligence, 203. [24] Y. Wang, Y.-X. Wang an A. Singh, A eterministic analysis of noisy sparse subspace clustering for imensionality-reuce ata, International Conference on Machine Learning, 205. [25] R. Basri an D. Jacobs, Lambertian reflection an linear subspaces, IEEE Transactions on Pattern Analysis an Machine Intelligence, [26] K. Lee, J. Ho an D. Kriegman, Acquiring linear subspaces for face recognition uner variable lighting, IEEE Transactions on Pattern Analysis an Machine Intelligence, [27] N. Ailon an B. Chazelle, The fast Johnson-Linenstrauss transform an approximate nearest neighbors, SIAM Journal on Computing, [28] G. Golub an C. Loan, Matrix Computations, The Johns Hopkins University Press, 3r eition, 996. [29] B. Recht, A simpler approach to matrix completion, Journal of Machine Learning Research, 20. [30] Z. Lin, R. Liu an Z. Su, Linearize alternating irection metho with aaptive penalty for low rank representation, Avances in Neural Information Processing Systems, 20.

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

The Information-Theoretic Requirements of Subspace Clustering with Missing Data

The Information-Theoretic Requirements of Subspace Clustering with Missing Data The Information-Theoretic Requirements of Subspace Clustering with Missing Data Daniel L. Pimentel-Alarcón Robert D. Nowak University of Wisconsin - Madison, 53706 USA PIMENTELALAR@WISC.EDU NOWAK@ECE.WISC.EDU

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

arxiv: v1 [stat.ml] 28 Mar 2017

arxiv: v1 [stat.ml] 28 Mar 2017 Algebraic Variety Moels for High-Rank Matrix Completion Greg Ongie Rebecca Willett Robert D. Nowak Laura Balzano March 29, 27 arxiv:73.963v [stat.ml] 28 Mar 27 Abstract We consier a generalization of low-rank

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION LOW ALGEBRAIC DIMENSION MATRIX COMPLETION Daniel Pimentel-Alarcón Department of Computer Science Georgia State University Atlanta, GA, USA Gregory Ongie and Laura Balzano Department of Electrical Engineering

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Computing the Longest Common Subsequence of Multiple RLE Strings

Computing the Longest Common Subsequence of Multiple RLE Strings The 29th Workshop on Combinatorial Mathematics an Computation Theory Computing the Longest Common Subsequence of Multiple RLE Strings Ling-Chih Yao an Kuan-Yu Chen Grauate Institute of Networking an Multimeia

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Greedy Subspace Clustering

Greedy Subspace Clustering Technical Report 111 Greey Subspace Clustering Research Supervisor Constantine Caramanis Wireless Networing an Communications Group September 016 Data-Supporte Transportation Operations & Planning Center

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

On the Aloha throughput-fairness tradeoff

On the Aloha throughput-fairness tradeoff On the Aloha throughput-fairness traeoff 1 Nan Xie, Member, IEEE, an Steven Weber, Senior Member, IEEE Abstract arxiv:1605.01557v1 [cs.it] 5 May 2016 A well-known inner boun of the stability region of

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Image Denoising Using Spatial Adaptive Thresholding

Image Denoising Using Spatial Adaptive Thresholding International Journal of Engineering Technology, Management an Applie Sciences Image Denoising Using Spatial Aaptive Thresholing Raneesh Mishra M. Tech Stuent, Department of Electronics & Communication,

More information

DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10

DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10 DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10 5. Levi-Civita connection From now on we are intereste in connections on the tangent bunle T X of a Riemanninam manifol (X, g). Out main result will be a construction

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

3.2 Shot peening - modeling 3 PROCEEDINGS

3.2 Shot peening - modeling 3 PROCEEDINGS 3.2 Shot peening - moeling 3 PROCEEDINGS Computer assiste coverage simulation François-Xavier Abaie a, b a FROHN, Germany, fx.abaie@frohn.com. b PEENING ACCESSORIES, Switzerlan, info@peening.ch Keywors:

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

Similarity Measures for Categorical Data A Comparative Study. Technical Report

Similarity Measures for Categorical Data A Comparative Study. Technical Report Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN

More information

Iterated Point-Line Configurations Grow Doubly-Exponentially

Iterated Point-Line Configurations Grow Doubly-Exponentially Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection

More information

Breaking the Limits of Subspace Inference

Breaking the Limits of Subspace Inference Breaking the Limits of Subspace Inference Claudia R. Solís-Lemus, Daniel L. Pimentel-Alarcón Emory University, Georgia State University Abstract Inferring low-dimensional subspaces that describe high-dimensional,

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

arxiv: v1 [math.mg] 10 Apr 2018

arxiv: v1 [math.mg] 10 Apr 2018 ON THE VOLUME BOUND IN THE DVORETZKY ROGERS LEMMA FERENC FODOR, MÁRTON NASZÓDI, AND TAMÁS ZARNÓCZ arxiv:1804.03444v1 [math.mg] 10 Apr 2018 Abstract. The classical Dvoretzky Rogers lemma provies a eterministic

More information

Optimal CDMA Signatures: A Finite-Step Approach

Optimal CDMA Signatures: A Finite-Step Approach Optimal CDMA Signatures: A Finite-Step Approach Joel A. Tropp Inst. for Comp. Engr. an Sci. (ICES) 1 University Station C000 Austin, TX 7871 jtropp@ices.utexas.eu Inerjit. S. Dhillon Dept. of Comp. Sci.

More information

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Chuang Wang, Yonina C. Elar, Fellow, IEEE an Yue M. Lu, Senior Member, IEEE Abstract We present a high-imensional analysis

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

Integration Review. May 11, 2013

Integration Review. May 11, 2013 Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

arxiv: v3 [cs.lg] 3 Dec 2017

arxiv: v3 [cs.lg] 3 Dec 2017 Context-Aware Generative Aversarial Privacy Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, an Ram Rajagopal arxiv:1710.09549v3 [cs.lg] 3 Dec 2017 Abstract Preserving the utility of publishe atasets

More information

A New Minimum Description Length

A New Minimum Description Length A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum

More information

2886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 5, MAY 2015

2886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 5, MAY 2015 886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 61, NO 5, MAY 015 Simultaneously Structure Moels With Application to Sparse an Low-Rank Matrices Samet Oymak, Stuent Member, IEEE, Amin Jalali, Stuent Member,

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

A Second Time Dimension, Hidden in Plain Sight

A Second Time Dimension, Hidden in Plain Sight A Secon Time Dimension, Hien in Plain Sight Brett A Collins. In this paper I postulate the existence of a secon time imension, making five imensions, three space imensions an two time imensions. I will

More information

arxiv: v2 [cs.ds] 11 May 2016

arxiv: v2 [cs.ds] 11 May 2016 Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Local Linear ICA for Mutual Information Estimation in Feature Selection

Local Linear ICA for Mutual Information Estimation in Feature Selection Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

arxiv: v1 [cs.it] 21 Aug 2017

arxiv: v1 [cs.it] 21 Aug 2017 Performance Gains of Optimal Antenna Deployment for Massive MIMO ystems Erem Koyuncu Department of Electrical an Computer Engineering, University of Illinois at Chicago arxiv:708.06400v [cs.it] 2 Aug 207

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information