ORIE 6340: Mathematics of Data Science

Size: px
Start display at page:

Download "ORIE 6340: Mathematics of Data Science"

Transcription

1 ORIE 6340: Matheatics of Data Science Daek Davis Contents 1 Estiation in High Diensions Tools for understanding high-diensional sets Concentration of volue in high-diensions Rando sections of high-diensional convex sets Gaussian width Estiation fro linear observations Estiation based on M bound Estiation as a (tractable) optiization proble A proof of a general M bound Fro expectation to overwheling probability Consequences: estiation fro noisy easureents Applications Sparse recovery for general dictionaries Exact recovery The geoetrical eaning of exact recovery Escape through a esh Exact sparse recovery Estiation in High Diensions Disclaier: This section is heavily based on [2, 10, 11] The ain goal. We wish to estiate a vector x contained in a set K R n, fro easureents y 1,..., y of x. Vector x ay represent a signal, a paraeter of a distribution, or an unknown atrix. The set K encodes prior inforation on x or properties we want to enforce on x. 1

2 Figure 1: Estiating a signal in high diensions Brief exaples (ore later). easureents as possible. Goal in each proble is to recover x K, given as few 1. (Copressed sensing) Set K is set of k-sparse vectors, i.e., those with at ost k nonzeros. Measureents are linear (a i Gaussian is typical) y i = a i, x i = 1,...,. Rearkably, with high probability, x can be efficiently recovered fro = O(k log(n/k)) linear easureents. 2. (Matrix copletion) Set K is set of low rank atrices. Measureents are a sapling of x s entries: y i = x li,k i i = 1,...,. Rearkably, with high-probability, if entries chosen uniforly at rando and x is incoherent, atrix can be efficiently recovered fro O(poly(rank(x), log(n), µ)n) easureents. 3. (Nonlinear Measureents.) Measureents aybe nonlinear e.g., y i = sign( a i, x ) or Ey i = θ( a i, x ). First exaple is siply logistic regression, while second is called generalized linear odel. Low coplexity structure. How any easureents are required for efficient estiation? Depends on the coplexity or diension of K. Intuitively, only = O(di(K))) easureents needed to recover K. 1 Is that the best we can do? Certainly not: the set of k-sparse vectors is full-diensional, yet copressive sensing techniques ay be used to recover a sparse signal with O(k log(n/k)) n easureents. School of Operations Research and Inforation Engineering, Cornell University, Ithaca, NY 14850, USA; people.orie.cornell.edu/dsd95/. 1 Let us define the algebraic diension of a set to be the diension of the sallest subspace containing that set. 2

3 In general, feasible sets usually have high algebraic diension, but low-coplexity. Exaples include iages, adjacency atrices of networks, and regression coefficients. Oddly, a certain space of 3 3 iages of high-contrast patches is close to a set that is topologically equivalent to a Klein bottle, which has low coplexity [4]. So our goal for this section is threefold 1. Quantify coplexity of general sets K R n. 2. Show estiation is possible few easureents for low coplexity K. 3. Design algorithically efficient estiators. 1.1 Tools for understanding high-diensional sets What do high diensional convex bodies look like? 2 Figure 1.1 illustrates a counterintuitive feature of the high-diensional ball of volue 1: ost of its volue lies within a fixed slab. Soewhat contradictory, we will see in a oent ost of the volue of ball also lies near the boundary of the ball. Figure 2: Balls of volue 1 in varying diensions. The region contained within the dashed lines is the slab { 1/2 x 1 1/2}. The slab contains 96% of the volue of each ball. Heuristic. Convex bodies consist of two parts: the bulk and the outliers (see Figure 1.1). The bulk akes up ost of the volue, but has sall diaeter (usually looks like a ball); the outliers contribute little to volue but are large in diaeter. 2 A convex body is a closed, convex, bounded set with nonepty interior. 3

4 Figure 3: V. Milan s hyperbolic drawings of high diensional convex sets For exaple, the Euclidean ball B inscribed within the l 1 ball K = B 1 = {x x 1 1} has radius 2/ n, but vol(b) 1/n vol(k) 1/n 1 n. Conclusion: The ball B, perhaps inflated by a constant factor, fors the bulk of K. The outliers of K are the tentacles shown in Figure 1.1(b), which extend far beyond B in the coordinate directions. We can argue ore rigorously using concentration inequalities Concentration of volue in high-diensions. Consider an isotropic convex body K, eaning that a rando vector X distributed uniforly in K satisfies E [X] = 0 and E [ XX T ] = I n. Through translation and scaling, any convex body can be ade isotropic (in other words, this is not a restrictive assuption). A first result. At least 90% of the volue of K lies near the Euclidean ball of radius 10n. Indeed, E [ X 2] = Tr(E [ XX T ] ) = Tr(I n ) = n. Thus, Markov s inequality says that P ( X ) 10n E [ X 2 ] 10n = A uch ore powerful concentration result shows that the bulk of the volue of K lies near the sphere of radius n. Theore 1.1 (Volue distribution in high-diensional sets). For X and K as above, the following are true: there exists absolute constants c, C > 0, such that 4

5 1. (Concentration of volue) for t 1, we have 2. (Thin shell) for every ε (0, 1), we have P ( X t n ) exp ( ct n ), P ( X n > ε n ) C exp ( cε 3 n 1/2). Exaple: volue distribution in the hypercube. Let K = [ 3, 3] n be the isotropic hypercube. Then Theore 1.1 iplies that ost of the volue lies near the corners of the hypercube (these points have nor Θ( n)). On the other hand, alost no volue lies near the center of its facets (these points have nor Θ(1)) Rando sections of high-diensional convex sets. What do rando sections of high-diensional convex sets look like? A useful (soeties incorrect) heuristic is that the bulk of a convex body is a Euclidean ball. Thus, if E is a rando low-diensional subspace, we should expect that E isses the outliers and that the intersection E K looks like a ball (see Figure 1.1.2). This is the content of Dvoretsky s theore. Figure 4: A rando section of a high-diensional convex set Theore 1.2 (Dvoretsky s Theore). Let K R d be origin syetric convex body such that the axial volue ellipsoid is the Euclidean ball. Let ε (0, 1), let E be a unifor rando subspace (with respect to the Haar easure) of diension d = cε 2 log n. Then there exists an R > 0 such that probability.99, we have (1 ε)b(r) K E (1 + ε)b(r), where B(R) E is the Euclidean ball of radius R in the subspace E. 5

6 John s theore guarantees that every convex body contains an ellipsoid of axial volue. Any ellipsoid ay be apped to a Euclidean ball through an affine transforation. Thus, up to affine transforation, the assuptions of Dvoretsky s theore are pretty ild. What about high-diensional sections? High diensional subspaces are ore likely to intersect the outliers of K, so we should not expect such sections to be round. Can we estiate other properties of K E? For exaple, dia(k E)? We will see that the diaeter of rando sections is intiately connected to estiation in high-diensions. One way to get at this quantity is through ean width of a set Gaussian width The ean width and its variations are soe of the ost iportant concepts that we will learn about in this course. Later it will reappear, for exaple, when we study statistical learning theory, estiation probles, and sketching. Iportant. In the sequel, we no longer assue K is a convex body, but any bounded set. Figure 5: The width of a set in direction η. Figure depicts the width of a set K along the direction η S n 1. The width in direction η ay be expressed through the following forula: sup η, u v = sup η, z, u,v K z K K where K K = {u v u, v K} is the Minkowski su of K and K. This shows that width ay be expressed through the support function of K a fundaental object in convex analysis: σ K (η) + σ K ( η) = sup η, z, z K K 6

7 where σ K (η) = sup u K η, u. The spherical ean width is then siply the average width, over all directions [ ] w(k) = E sup z K K η, z, where η is distributed uniforly on the sphere. While the ean width has an intuitive geoetric description, the Gaussian width is a bit sipler to work with and has siilar iportance Definition 1.3. Let K R n, and let g N(0, I n ) be a standard gaussian vector. Then the Gaussian width of K is defined as [ ] w(k) = E sup z K K g, z. Note. It is coon to see other variants of the Gaussian width, for exaple, [ ] E sup g, z z K and [ ] 1/2 E sup g, z 2 z K It can easily be shown that these definitions asyptotically equivalent to the Gaussian width defined above. Relation between spherical and Gaussian widths. Rotation invariance of the Gaussian distribution shows that the rando variable g is independent fro the rando vector η = (1/ g )g, which happens to be uniforly distributed on the sphere. Thus, w(k) = E [ g sup ] η, z = E [ g ] w(k). z K K Since E [ g ] n, it follows that w(k) n w(k). The Gaussian width is invariant under several types of transfor- Invariance properties. ations: Proposition 1.4 (Invariance properties of Gaussian width). The Gaussian width is invariant under translations, orthogonal linear transforations, and taking convex hulls. The last property is iportant: the Gaussian width does not distinguish between convex and nonconvex sets: w(k) = w(conv(k)). This fact will justify convexification procedures for estiation probles over nonconvex sets. Exaple 1.1. We will now briefly copute a few exaples of Gaussian width. As soe of these calculations will appear on your hoework, we will not go through all of the details here: 7

8 1. (The Euclidean Ball.) If K = B2 n or K = S n 1, we have [ ] [ ] w(k) = E sup g, u v = 2E sup g, u = 2E [ g 2 ] n. u,v K u K 2. (Sets with algebraic diension d.) Suppose K B n 2 is contained in a d- diensional subspace. Then K is also contained in a d-diensional ball, so by rotation invariance, we have w(k) 2 d. 3. (Hyper Cube.) Let K = [ 1, 1] n. Then [ ] [ ] w(k) = E sup g, u v u,v K = 2E sup g, u u K 2 = 2E [ g 1 ] = 2 π n where the third equality follows fro duality and the fourth is a calculation. 4. (l 1 -ball.) Let K = B1 n. Then [ ] w(k) = E sup g, u v u,v K [ = 2E sup g, u u K ] = 2E [ g ] log(n) where the third equality follows fro duality and the fourth is a calculation. 5. (Finite sets.) Let K B n 2 be a finite set. Then w(k) log K. (Independent of diension!) This will be a hoework exercise. 6. (Sparsity.) Let K be the set of s-sparse unit vectors: K = {x R n : x 2 = 1, x 0 s}, where x 0 denotes the nuber of nonzero eleents of x. Then This will be a hoework exercise. w(k) s log(2n/s). 7. (Low-Rank.) Let K be the set of d 1 d 2 atrices of rank at ost r and unit Frobenius nor: K = {X : rank(x) r, X F = 1}. Then, we will later see that w(k) r(d 1 + d 2 ). 8

9 Odd behavior of width. The spherical width of B n 1 is uch saller than its diaeter: w(b n 1 ) log(n) n 2, Moreover, the Gaussian width of B1 n is, up to a logarithic factor, on the order of its inscribed ball. This is odd since B1 n looks uch larger than its inscribed ball. See figure On the other hand, the Gaussian width of the hypercube is roughly the sae as the Gaussian width of its circuscribed ball nb2 n. What s going on? The hypercube has 2 n vertices, so intuitively (and rigorously according to Theore 1.1), ost of its volue ust concentrate there. Moreover, the hypercube and its circuscribed ball have roughly the sae volue, hence, they are close. On the other hand, the ball B1 n only has 2n vertices, and so with high probability 3, a rando Gaussian vector is nearly orthogonal to all of the. Consequently, the width in a gaussian direction η is not really influenced by the tentacles. Figure 6: Odd behavior of Gaussian width. Squared width is a stable diension. For any set bounded set K, we ay define [ ] h(k) := E sup g, z 2. z K K It is easy to show that h(k) w(k) 2. Define the stable diension of K to be the quantity: d(k) = h(k) dia(k) 2. The stable diension acts as a robust variant of the algebraic diension, which ore accurately captures the coplexity of the underlying set. In fact, using ite 2 above, we autoatically get the following bound: w(k) 2 algebraic diension of K. dia(k) 2 A slightly stronger result holds for the stable diension. 3 Use a siple union bound 9

10 Lea 1.5. The stable diension of a bounded set is always bounded by the algebraic diension. This lea will appear as an (easy) hoework proble. Mean width fro a single realization? w(k, g) = Gaussian concentration iplies that sup g, z, z K K concentrates tightly about its ean, w(k), with high probability. Thus, to estiate the ean, we siply saple one g N(0, I n ), and copute the supreu. Since w(k) = w(conv(k)), the supreu ay be coputed by solving a convex optiization proble. Bounds on Gaussian width: connections to covering nubers. The covering nuber N(K, t) of a set K is the inial nuber of balls of radius t whose union covers K. The Gaussian width is deeply connected to covering nubers through the following theore. Theore 1.6 (Sudakov and Dudley s inequalities). For any bounded subset K R n, we have sup t log(n(k, t)) w(k) log(n(k, t))dt. t>0 The lower bound in the above theore is known as Sudakov s inequality and the upper bound is known as Dudley s inequality. Rando sections of sall codiension: M bound. us bound the diaeter of a rando section of K. 0 Returning to our question, let Theore 1.7 (M bound). Let K R n be a bounded set. Let E be a unifor rando subspace of codiension. Then E dia(k E) w(k). To get a feel for this bound, let s think about the two extrees: 1. ( = Ω(n)) In this case, E dia(k E) w(k) n w(k). In other words, the size of constant diensional rando section is bounded by the spherical ean width. This suggests that low-diensional subspaces pass through the bulk of K, but ignore the outliers (see Figure 1.1.2). 2. ( = O(1)) In this case, E dia(k E) w(k) w(k) n. One interpretation of this bound is that, beyond the bulk, one can pick up an extra n in diaeter fro the tentacles. 10

11 The first bound of this sort was proven by V. Milan [5]; the stateent presented here is due to Pajor and Toczak-Jaegerann [9]. We will get back to the proof of general version of this bound, but first lets dive into soe consequences. 1.2 Estiation fro linear observations Recall that our goal is to estiate an unknown vector x K R n given soe vector of easureents y = (y 1,..., y ) R whose coordinates are iid draws of a rando function of x. In this section, we will study a siple odel where the observations y i coe fro Gaussian linear functions. In particular, y i = a i, x i = 1,...,, where a i N(0, I n ) are standard Gaussian. We can rewrite this in vector for as y = Ax where a i is ith row of A. Note that A is full-rank with probability one, then if > n the proble of recovering x is trivial. The proble becoes interesting when the nuber of easureents is saller than the diension, i.e. < n. Without additional restrictions the proble is ill-posed. Hence, we need the constraint x K to enforce additional structure. Figure 7: Feasibility proble, estiating x in the intersection of K and {x Ax = y} Estiation based on M bound Under this setting we only have two pieces of inforation about x, 1. It satisfies Ax = y; 2. It belongs to K. It is natural to define an estiator given by the following feasibility proble Find x K such that A x = y, (1.1) see Figure?? for an illustration. Now, how good of an estiate is x? 11

12 Theore 1.8. Assue that K is a closed bounded set of R n and A is a Gaussian atrix 4 with < n. Then, the estiator given by (1.1) satisfies E sup x x 2 w(k). x K Proof. This is a direct consequence of the M bound, Theore 1.7. We will ake use of the following well-known fact: the rando subspace E = ker A is uniforly distributed over the set of subspaces of diension n. It is easy to see that w(k K) 2w(K). Then using the Theore 1.7, we deduce E sup x x 2 1 w(k) E dia((k K) E), x 2 which follows since x x, x x (K K) E Estiation as a (tractable) optiization proble The next question we would like to answer is, how can we copute the estiator in (1.1). A first step towards this goal is to substitute this feasibility proble by an optiization proble. To do so, we need to introduce an additional (ild) assuption on K. Fro know on we assue that K has non-epty interior and is star-shaped, i.e. the inclusion tk K t [0, 1] holds. This leads us to define the Minkowski functional of K as the function K : R n R given by x K = inf{λ > 0 λ 1 x K}. Miknowski functionals are standard notions in geoetric functional analysis and convex analysis. It is not hard to see that under the current assuptions on K, the functional K is continuous and positively hoogeneous 5 By definition we have K = {x x K 1}. Moreover, if K is a syetric convex body 6, then K defines a nor. With this notation in hand we can introduce the optiization proble arg in x K subject to Ax = y, (1.2) which leads us to the following guarantee. 4 As we described above, its entries are iid standard noral rando variables. 5 αx K = α x K, for all α > 0. 6 K is convex, bounded, closed, origin-syetric (K = K), and has nonepty interior 12

13 Theore 1.9. Let K be a star-shaped bounded closed set with nonepty interior. Then, the estiator x given by (1.2) satisfies E sup x x w(k). x K Proof. Due to Theore 1.8 it suffices to check that x K. This follows iediately since x K x K 1, by definition. Convex relaxations. The issue with the previous optiization proble is that it could be hard to solve (in fact, solving nonconvex progras is NP-hard). Raising the question: how to devise a coputationally tractable estiator? If K is convex body, the proble becoes a convex progra and we can use off-the-shelf solvers, like interior-point ethods, subgradient algortihs, or proxial-splitting ethods. Furtherore, if K is a polytope, (1.2) can be cast as a linear progra, which open the others to even faster algoriths. Given the invariance under convex hulls of the Gaussian width, it s natural to consider convexifying the set K. Define the convex relaxation proble arg in x conv(k) subject to Ax = y. (1.3) We recover exactly error bounds in Theore 1.9, to see this note that if x is the estiator defined in (1.3) satisfies E sup x K x x 2 E sup x conv(k) x x 2 w(conv(k)) = w(k). Progra (1.3) gives a tractable approach for any convex bodies K. Indeed there are soe convex bodies for which coputing K is hard. Later we will see soe relevant exaples with tractable relaxations. Inforation-theoretical aspects. The aforeentioned results iply that if we fix a desired accuracy, E sup x x 2 ε, then w 2 (K) x K observations will suffice suffice. Where the hidden constant depends on the accuracy. It is worth noting that this result is unifor. In the sense that we can use Markov s inequality to ensure that with fixed probability, say 0.9, the estiation is siultaneously accurate for all vectors x K. Later we will see that the actual probability is uch better; it goes exponentially fast as a function of. 1.3 A proof of a general M bound Now we will give a proof of a generalization of the M bound, Theore

14 Theore 1.10 (General M bound). Let T be a bounded set. Let A be a n Gaussian atrix. Fix ɛ > 0 and define the set Then T ɛ = {u T E sup u T ɛ u 2 8π where g is a standard Gaussian rando vector. 1 Au 2 ɛ}. (1.4) E sup g, y + u T Exercise 1. The previous stateent generalizes Theore 1.7, why? π 2 ɛ (1.5) Before we go on, soe coents are in order. This theore will allows us to handle noisy easureents of the for y = Ax + ν. For the conclusion (1.5) we are assuing the convention sup u u = 0. A ore general stateent holds. One can relax the assuption on the distribution of the entries of A to sub-gaussianity. Nonetheless, we will not go into this tricker setting, we refer the interested reader to [10, Section 8]. To prove the bound we will need a couple of siple tools fro epirical processes. Recall that an stochastic process is a collection of rando variables (Z(t)) t T over the sae probability space. The indexing set could denote tie, as it happens for Brownian otion, or it could be a subset of R n, as we saw for exaple with the Gaussian width. Proposition Consider a finite collection of stochastic processes Z 1 (t),..., Z (t) indexed by t T. Let ε i be independent Radeacher rando variable 7. Then the following holds, - (Syetrization). E sup t T [ Zi (t) EZ i (t) ] 2E sup t T ε i Z i (t) ; - (Contraction). E sup ε t T i Z i (t) 2E sup t T ε i Z i (t) ; These stateents are not too hard to proof and ore general stateents can be found in [11] or [3]. Proof of Theore The conclusion (1.5) would follow if we proved the deviation inequality 1 2 E sup a i, u u T π u 2 4 E sup g, y. (1.6) u T 7 P(ε i = 1) = P(ε i = 1) = 1/2. 14

15 If it holds for T, then it also holds for T ɛ T on the left hand side. Moreover, for u T ɛ we ve got 1 ai, u = 1 Au 1 ɛ. Hence (1.5) follows by an applications of the triangle inequality. Rotation invariance gives 2 E a i, u = π u. Hence by syetrization and contraction, we can bound the left hand side of (1.6) E sup u T 1 2 a i, u π u 2 2E sup t T 4E sup t T = 4E sup t T Observe that ε i a i and a i have the sae distribution, hence g := 1 ε i a i is a standard Gaussian vector. Thus it can be cast as 4 E sup g, u, u T proving the result ε i a i, u ε i a i, u ε i a i, u. (1.7) Fro expectation to overwheling probability As it turns out one can use the M bound to derive high probability guarantees via concentration of easure. For this, we will need the Gaussian concentration inequality. Proposition Assue that g N(0, I ) is a Gaussian vector and let f : R R be an L-Lipschitz function. Then P ( f(g) Ef(g) t) 2 exp( /2L 2 ). For a proof of this result see for exaple [3]. Then we get the next result. Theore Let T be a bounded set. Let A be a n Gaussian atrix. Fix ɛ > 0 and define the set 1 T ɛ = {u T Au 2 ɛ}. (1.8) Then 8π π sup u 2 u T ɛ E sup g, y + (ɛ + t) (1.9) u T 2 with probability at least 1 2 exp ( t 2 /2 ax u T u 2 ). 15

16 Exercise 2. Prove the previous theore. Hint: Consider the function f(a) = sup u T 1 a i, u Consequences: estiation fro noisy easureents u π 2. Let us use the new M bound in a sightly general context. Assue we observe easureents y = Ax + ν where ν odels bounded noise and satisfies 1 ν 1 = 1 νi ɛ. The noise ν is unknown and arbitrary; in particular, it could be correlated with A or x. To handle the noise we could consider the feasibility proble Leading to the following guarantee. Theore If x is a solution to (1.10), then Find x K such that 1 A x y 1 ε. (1.10) E sup x x 2 8π x K ( ) w(k) + ɛ. (1.11) Proof. Lets apply the M bound to T = K K and with 2ɛ instead of ɛ, we get 8π E sup u 2 u T 2ɛ E sup g, y + 2πɛ ( ) w(k) 8π + ɛ, u T the inequality follows by the definition of the Gaussian width and the fact that T is syetric. To finish the proof we need to ensure that x x T 2ɛ for any x K. By construction, x, x K so x x T. Now, by the triangle inequality and the constraints on x 1 A( x x) 1 = 1 A x y + ν 1 1 A x y + 1 ν 1 2ɛ. Consequently, E sup x K x x 2 E sup u T2ɛ u 2 finishing the proof. Siilarly to before, if K is a closed star-shaped bounded set with nonepty interior, we define the optiization proble arg in x K subject to 1 Ax y ɛ, (1.12) and recover the following theore. Theore For any fixed x K let x be a solution of (1.12). Then E sup x x 2 ( ) w(k) 8π + ɛ. x K Again, we can take the convex relaxation of (1.12) and still get the sae error bound. 16

17 1.4 Applications Next, we will talk about explicit applications of the M bound Sparse recovery for general dictionaries In soe fields, such as signal processing and haronic analysis, it is often convenient to consider redundant data representations. One way to achieve this is by considering a dictionary, i.e. an arbitrary collection of vectors v 1,..., v N R n that span (they could be linearly dependent). Of course, the choice of the dictionary depends on the application. For a deeper introduction to these ideas see for exaple [6]. Given the redundancies it is natural to wonder about sparse representations. We say that a vector x R d is s-sparse if it can written as a linear cobination of at ost s vectors in the dictionary, i.e. x = α i v i with at ost s non-zero coefficients α i R. Our goal now is to recover such sparse representation fro noisy easureents. Just as before, 1 y = Ax + ν with ν. Based on our previous success with convex progras, we consider the proble arg in α 1 subject to 1 Ax y ɛ, N x = α iv i. (1.13) Exercise 3. Define K := conv{±v i } N. Prove that { } N x K = in α 1 : x = α iv i for all x R n. Theore Assue that all dictionary vectors satisfy v i 1. Let x be a solution of the convex proble (1.13). Then s log N E x x 2 C α 2 + 2πɛ. Proof. Fix x R n, we will apply the M bound to the following polytope K := α 1 K. By assuption x K. Hence, by Exercise 3 the two probles (1.12) and (1.13) are equivalent. Thus, we iediately get the bound (1.11). 17

18 Next, let us copute the Gaussian width in this error bound. convexification and Exaple 1.1 we deduce By invariance under w(k) = α 1 w( K) C α 1 log N C s α 2 log N, where the last inequality follows since for any s-sparse vector α we have α 1 s α 2. Substituting this in (1.11) gives the desired result. We could generalize this proof for general dictionaries at the expense of an extra ax i v i 2. An iportant advantage of (1.13) is that it can be cast as a linear progra (how?) and thus we can rely on very fast solvers! Another positive feature of this forulation is that it autoatically gives us a representation α. Since we are only approxiating x there is a priori no reason to believe that α will be sparse. Nonetheless, one can prove 8, using a siple diensionality arguent, that if x is s-sparse in the dictionary then α will be s-sparse. Recovery for the canonical dictionary. siplest dictionaries, the canonical basis To crystallize ideas, let us consider one of the v i = e i for all i = 1,..., n. In this case sparsity in the dictionary coincides with sparsity of the vector itself. arg in x 1 subject to 1 Ax y ɛ. (1.14) As a direct corollary of the previous theore we obtain. Corollary Fix a s-sparse vector x R n and let x be a solution of (1.18). Then s log n E x x 2 C x 2 + 2πɛ. Takeaway. Using linear prograing, we can approxiately recover a s-sparse vector in a general dictionary of size N, fro s log N rando easureents. 1.5 Exact recovery Previously, we saw guarantees in the accuracy of our estiator. Surprisingly, in soe scenarios it is possible to ensure perfect recovery, i.e. x = x, with overwheling probability. Assue for this section that we have noiseless easureents where A is a n Gaussian atrix. 8 under very ild regularity conditions y = Ax 18

19 (a) Descent cone D(K, x) and E x. (b) Illustration of recovery condition in ters of the spherical cap S(K, x). Figure 8: Geoetrical concepts involved in exact recovery condition The geoetrical eaning of exact recovery In order to derive results we need to find a characterization of exact recovery. Reeber that we have two pieces of inforation. The first one tell us that the signal we d like to recover lies in the set K and the second one tell us that it belongs to the affine space E x = {x Ax = y}. These two pieces copletely deterine x if, and only, K E x = {x}. (1.15) We cannot wish to capture this with the M bound, in fact, this equality iplies the diaeter is zero. How to describe this phenoenon geoetrically? To siplify the scenario, let us assue for now that K is convex. Then it is clear that exact recovery (1.15) holds if and only if the affine subspace E x is tangent to K at x. Notice that even when K is nonconvex this equivalence still holds locally, i.e. if we restrict(intersect) everything to a sufficiently sall ball around x.. Hence, we will lose nothing if we replace K by its tangent cone, also known as descent cone. That is, the cone of all direction that go into K eanating fro x, D(K, x) := cone{z x : z K} where cone(c) = t 0 tc. Translating condition (1.15) to zero, gives (K x) (E x x) = {0}. (1.16) Locally (and globally for convex sets), this is equivalent to having D(K, x) instead of (K x) and noting that E x x = ker A, we arrive to D(K, x) ker(a) = {0}. (1.17) For our purposes, the descent cone can be substituted by its intersection with the sphere, i.e. { } z x S(K, x) = D(K, x) S n 1 = z x x K. Hence we obtain the following equivalent for of exact recovery (see Figure 8) S(K, x) ker A =. 19

20 1.5.2 Escape through a esh Studying the likelihood of a nonepty intersection between two sets has a long history and have been widely studied in geoetric probability, see for exaple the fantastic book by Klain and Rota [8]. Indeed, for the specific case of a subspace and a spherical cap there exists a sharp result, this is known as the escape through a esh Theore and is due to Gordon [7]. The theore is stated in ters of a sightly different version of the Gaussian width w(s) = E sup g, u. u S Theore Let S be a fixed subset of S n 1 and let E be a unifor rando subspace of R n of a fixed codiension n. Assue that w(s) Then, S E = ( ( ) ) 2 with probability at least exp w(s). This theore is way stronger than the M bound. In fact one can recover a siilar stateent using this bound, but one can only ensure the probability P(S E ) w(k)/. Now, how to convert this into an algorithic stateent? We already develop estiators that are able to find a point x in K E x, either by solving a feasibility proble or an optiization proble. By the discussion above, if we assue convexity of K, such point is the true signal x = x if, and only if, S(K, x) ker E =. Thus, we iediately get the following theore Theore Assue that K is a syetric convex body and let x be the solution of (1.1) or (1.2). Further, suppose that the nuber of easureents satisfies Then > w(s(k, x)). x = x with overwheling probability (the sae as in Theore 1.18). Takeaway. Provided we can solve the associated optiization proble, exact recovery is possible whenever the nuber of observations is greater than the Gaussian width of D(K, x) S n 1. 20

21 Recurrent takeaway. The Gaussian width easures the saple coplexity of these estiation proble! Exact sparse recovery Let us deonstrate the power of this result by considering a concrete exaple. Consider again the sparse recovery proble. Let x be an s-sparse vector in R n and define the set One can write the descent cone as K = x 1 B n 1 = {x x 1 x 1 }. D(K, x) = cone{z x + z 1 x 1 } 9. Then, after soe bits of agic ath one gets (you will have to this in a hoework) w(s(k, x)) s log(2n/s). Thanks to our construction, K = 1 x 1 1 (why?) then in this case we can rewrite progra 1.2 with the proportional objective objective 1. That is arg in x 1 subject to Ax = y, (1.18) and we get an exact recovery analogous of (1.17). Theore Assue that x R n is an unknown s-sparse vector. Let x be a solution of (1.18). Then with probability at least 1 3 exp( ) we have x = x provided that > Cs log(n/s) for soe universal constant C > 0. This is actually sharp, see Figure 9. One can prove that if < s log(n/s) the probability of recovering using this ethod is very sall. We will not go into this, but we recoend [1] as a reference for phase transition phenoena in recovery probles. References [1] Dennis Aelunxen, Martin Lotz, Michael B McCoy, and Joel A Tropp. Living on the edge: Phase transitions in convex progras with rando data. Inforation and Inference: A Journal of the IMA, 3(3): , This is why they are call descent cones, they usually contain the directions in which soe function(minkowski functional of K) decreases. 21

22 Figure 9: Epirical probability of exact recovery for different paraeters. White eans probability one, black probability zero. Taken fro [1]. [2] Keith Ball et al. An eleentary introduction to odern convex geoetry. Flavors of geoetry, 31:1 58, [3] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasyptotic theory of independence. Oxford university press, [4] Gunnar Carlsson, Tigran Ishkhanov, Vin De Silva, and Afra Zoorodian. On the local behavior of spaces of natural iages. International journal of coputer vision, 76(1):1 12, [5] V D. Milan. Rando subspaces of proportional diension of finite diensional nored spaces: Approach through the isoperietric inequality, volue 1166, pages [6] David L. Donoho and Michael Elad. Optially sparse representation in general (nonorthogonal) dictionaries via l1 iniization. Proceedings of the National Acadey of Sciences, 100(5): , [7] Y. Gordon. On ilan s inequality and rando subspaces which escape through a esh in n. In Jora Lindenstrauss and Vitali D. Milan, editors, Geoetric Aspects of Functional Analysis, pages , Berlin, Heidelberg, Springer Berlin Heidelberg. [8] Daniel A Klain and Gian-Carlo Rota. Introduction to geoetric probability. Cabridge University Press, [9] Alain Pajor and Nicole Toczak-Jaegerann. Subspaces of sall codiension of finite-diensional banach spaces. Proceedings of the Aerican Matheatical Society, 97(4): , [10] Roan Vershynin. Estiation in high diensions: a geoetric perspective. In Sapling theory, a renaissance, pages Springer,

23 [11] Roan Vershynin. High-diensional probability: An introduction with applications in data science, volue 47. Cabridge University Press,

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

L p moments of random vectors via majorizing measures

L p moments of random vectors via majorizing measures L p oents of rando vectors via ajorizing easures Olivier Guédon, Mark Rudelson Abstract For a rando vector X in R n, we obtain bounds on the size of a saple, for which the epirical p-th oents of linear

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

arxiv: v5 [cs.it] 16 Mar 2012

arxiv: v5 [cs.it] 16 Mar 2012 ONE-BIT COMPRESSED SENSING BY LINEAR PROGRAMMING YANIV PLAN AND ROMAN VERSHYNIN arxiv:09.499v5 [cs.it] 6 Mar 0 Abstract. We give the first coputationally tractable and alost optial solution to the proble

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Composite optimization for robust blind deconvolution

Composite optimization for robust blind deconvolution Coposite optiization for robust blind deconvolution Vasileios Charisopoulos Daek Davis Mateo Díaz Ditriy Drusvyatskiy Abstract The blind deconvolution proble seeks to recover a pair of vectors fro a set

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Metric Entropy of Convex Hulls

Metric Entropy of Convex Hulls Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Sampling and high-dimensional convex geometry

Sampling and high-dimensional convex geometry Sampling and high-dimensional convex geometry Roman Vershynin SampTA 2013 Bremen, Germany, June 2013 Geometry of sampling problems Signals live in high dimensions; sampling is often random. Geometry in

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

A Quantum Observable for the Graph Isomorphism Problem

A Quantum Observable for the Graph Isomorphism Problem A Quantu Observable for the Graph Isoorphis Proble Mark Ettinger Los Alaos National Laboratory Peter Høyer BRICS Abstract Suppose we are given two graphs on n vertices. We define an observable in the Hilbert

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization Structured signal recovery fro quadratic easureents: Breaking saple coplexity barriers via nonconvex optiization Mahdi Soltanolkotabi Ming Hsieh Departent of Electrical Engineering University of Southern

More information

Four-vector, Dirac spinor representation and Lorentz Transformations

Four-vector, Dirac spinor representation and Lorentz Transformations Available online at www.pelagiaresearchlibrary.co Advances in Applied Science Research, 2012, 3 (2):749-756 Four-vector, Dirac spinor representation and Lorentz Transforations S. B. Khasare 1, J. N. Rateke

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Hamming Compressed Sensing

Hamming Compressed Sensing Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Kinetic Theory of Gases: Elementary Ideas

Kinetic Theory of Gases: Elementary Ideas Kinetic Theory of Gases: Eleentary Ideas 17th February 2010 1 Kinetic Theory: A Discussion Based on a Siplified iew of the Motion of Gases 1.1 Pressure: Consul Engel and Reid Ch. 33.1) for a discussion

More information

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL PREPRINT 2006:7 Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL Departent of Matheatical Sciences Division of Matheatics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG UNIVERSITY

More information

Lecture 21 Nov 18, 2015

Lecture 21 Nov 18, 2015 CS 388R: Randoized Algoriths Fall 05 Prof. Eric Price Lecture Nov 8, 05 Scribe: Chad Voegele, Arun Sai Overview In the last class, we defined the ters cut sparsifier and spectral sparsifier and introduced

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

A Simple Homotopy Algorithm for Compressive Sensing

A Simple Homotopy Algorithm for Compressive Sensing A Siple Hootopy Algorith for Copressive Sensing Lijun Zhang Tianbao Yang Rong Jin Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Departent of Coputer

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

4 = (0.02) 3 13, = 0.25 because = 25. Simi- Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =

More information

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1.

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1. M ath. Res. Lett. 15 (2008), no. 2, 375 388 c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS Van H. Vu Abstract. Let F q be a finite field of order q and P be a polynoial in F q[x

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information