Efficient Sketches for the Set Query Problem

Size: px
Start display at page:

Download "Efficient Sketches for the Set Query Problem"

Transcription

1 Efficient Setches for the Set Query Problem Eric Price Abstract We develop an algorithm for estimating the values of a vector x R n over a support S of size from a randomized sparse binary linear setch Ax of size O(. Given Ax and S, we can recover x with x x S x x S with probability at least Ω(. The recovery taes O( time. While interesting in its own right, this primitive also has a number of applications. For example, we can:. Improve the linear -sparse recovery of heavy hitters in Zipfian distributions with O( log n space from a + approximation to a + o( approximation, giving the first such approximation in O( log n space when O(n.. Recover bloc-sparse vectors with O( space and a + approximation. Previous algorithms required either ω( space or ω( approximation. Introduction In recent years, a new linear approach for obtaining a succinct approximate representation of n-dimensional vectors (or signals has been discovered. For any signal x, the representation is equal to Ax, where A is an m n matrix, or possibly a random variable chosen from some distribution over such matrices. The vector Ax is often referred to as the measurement vector or linear setch of x. Although m is typically much smaller than n, the setch Ax often contains plenty of useful information about the signal x. A particularly useful and well-studied problem is that of stable sparse recovery. The problem is typically defined as follows: for some norm parameters p and q and an approximation factor C >, given Ax, recover a vector x such that ( x x p C Err q (x,, where Err q (x, = min -sparse ˆx ˆx x q where we say that ˆx is -sparse if it has at most non-zero coordinates. Sparse recovery has applications to numerous areas such as data stream computing [Mut3, Ind7] This research has been supported in part by the David and Lucille Pacard Fellowship, MADALGO (Center for Massive Data Algorithmics, funded by the Danish National Research Association, NSF grant CCF-78645, a Cisco Fellowship, and the NSF Graduate Research Fellowship Program. MIT CSAIL and compressed sensing [CRT6, Don6], notably for constructing imaging systems that acquire images directly in compressed form (e.g., [DDT + 8, Rom9]. The problem has been a subject of extensive study over the last several years, with the goal of designing schemes that enjoy good compression rate (i.e., low values of m as well as good algorithmic properties (i.e., low encoding and recovery times. It is nown that there exist distributions of matrices A and associated recovery algorithms that for any x with high probability produce approximations x satisfying Equation ( with l p = l q = l, constant approximation factor C = +, and setch length m = O( log(n/; it is also nown that this setch length is asymptotically optimal [DIPW, FPRU]. Similar results for other combinations of l p /l q norms are nown as well. Because it is impossible to improve on the setch size in the general sparse recovery problem, recently there has been a large body of wor on more restricted problems that are amenable to more efficient solutions. This includes model-based compressive sensing [BCDH], which imposes additional constraints (or models on x beyond near-sparsity. Examples of models include bloc sparsity, where the large coefficients tend to cluster together in blocs [BCDH, EKB9]; tree sparsity, where the large coefficients form a rooted, connected tree structure [BCDH, LD5]; and being Zipfian, where we require that the histogram of coefficient size follow a Zipfian (or power law distribution. A sparse recovery algorithm needs to perform two tass: locating the large coefficients of x and estimating their value. Existing algorithms perform both tass at the same time. In contrast, we propose decoupling these tass. In models of interest, including Zipfian signals and bloc-sparse signals, existing techniques can locate the large coefficients more efficiently or accurately than they can estimate them. Prior to this wor, however, estimating the large coefficients after finding them had no better solution than the general sparse recovery problem. We fill this gap by giving an optimal method for estimating the values of the large coefficients after locating them. In particular, a random Gaussian matrix [CD4] or a random sparse binary matrix ([GLPS9], building on [CCF, CM4] has this property with overwhelming probability. See [GI] for an overview.

2 We refer to this tas as the Set Query Problem. Main result. (Set Query Algorithm. We give a randomized distribution over O( n binary matrices A such that, for any vector x R n and set S {,..., n} with S =, we can recover an x from Ax + ν and S with x x S ( x x S + ν where x S R n equals x over S and zero elsewhere. The matrix A has O( non-zero entries per column, recovery succeeds with probability Ω(, and recovery taes O( time. This can be achieved for arbitrarily small >, using O(/ rows. We achieve a similar result in the l norm. The set query problem is useful in scenarios when, given a setch of x, we have some alternative methods for discovering a good support of an approximation to x. This is the case, e.g., in bloc-sparse recovery, where (as we show in this paper it is possible to identify heavy blocs using other methods. It is also a natural problem in itself. In particular, it generalizes the well-studied point query problem [CM4], which considers the case that S is a singleton. We note that, although the set query problem for sets of size can be reduced to instances of the point query problem, this reduction is less space-efficient than the algorithm we propose, as elaborated below. Techniques. Our method is related to existing sparse recovery algorithms, including Count-Setch [CCF] and Count-Min [CM4]. In fact, our setch matrix A is almost identical to the one used in Count-Setch each column of A has d random locations out of O(d each independently set to ±, and the columns are independently generated. We can view such a matrix as hashing each coordinate to d bucets out of O(d. The difference is that the previous algorithms require O( log measurements to achieve our error bound (and d = O(log, while we only need O( measurements and d = O(. We overcome two obstacles to bring d down to O( and still achieve the error bound with high probability 3. First, in order to estimate the coordinates x i, we need a more elaborate method than, say, taing the median of the bucets that i was hashed into. This is because, with constant probability, all such bucets might contain some other elements from S (be heavy and therefore using any of them as an estimator for y i would result in too much error. Since, for super-constant values of S, it is highly liely that such an event will occur for at least one i S, it follows that this type of estimation results in large error. The term set query is in contrast to point query, used in e.g. [CM4] for estimation of a single coordinate. 3 In this paper, high probability means probability at least / c for some constant c >. We solve this issue by using our nowledge of S. We now when a bucet is corrupted (that is, contains more than one element of S, so we only estimate coordinates that lie in a large number of uncorrupted bucets. Once we estimate a coordinate, we subtract our estimation of its value from the bucets it is contained in. This potentially decreases the number of corrupted bucets, allowing us to estimate more coordinates. We show that, with high probability, this procedure can continue until it estimates every coordinate in S. The other issue with the previous algorithms is that their analysis of their probability of success does not depend on. This means that, even if the head did not interfere, their chance of success would be a constant (lie Ω(d rather than high probability in (meaning Ω(d. We show that the errors in our estimates of coordinates have low covariance, which allows us to apply Chebyshev s inequality to get that the total error is concentrated around the mean with high probability. Related wor. A similar recovery algorithm (with d = has been analyzed and applied in a streaming context in [EG7]. However, in that paper the authors only consider the case where the vector y is -sparse. In that case, the termination property alone suffices, since there is no error to bound. Furthermore, because d = they only achieve a constant probability of success. In this paper we consider general vectors y so we need to mae sure the error remains bounded, and we achieve a high probability of success. The recovery procedure also has similarities to recovering LDPCs using belief propagation, especially over the binary erasure channel. The similarities are strongest for exact recovery of -sparse y; our method for bounding the error from noise is quite different. Applications. Our efficient solution to the set query problem can be combined with existing techniques to achieve sparse recovery under several models. We say that a vector x follows a Zipfian or power law distribution with parameter α if xr(i = Θ( xr( i α where r(i is the location of the ith largest coefficient in x. When α > /, x is well approximated in the l norm by its sparse approximation. Because a wide variety of real world signals follow power law distributions ([Mit4, BKM + ], this notion (related to compressibility 4 is often considered to be much of the reason why sparse recovery is interesting [CT6, Cev8]. Prior to this wor, sparse recovery of power law distributions has only been solved via general sparse recovery methods: ( + Err (x, error in O( log(n/ measurements. However, locating the large coefficients in a power law 4 A signal is compressible when xr(i = O( xr( i α rather than Θ( xr( i α [CT6]. This allows it to decay very quicly then stop decaying for a while; we require that the decay be continuous.

3 distribution has long been easier than in a general distribution. Using O( log n measurements, the Count- Setch algorithm [CCF] can produce a candidate set S {,..., b} with S = O( that includes all of the top positions in a power law distribution with high probability (if α > /. We can then apply our set query algorithm to recover an approximation x to x S. Because we already are using O( log n measurements on Count- Setch, we use O( log n rather than O( measurements in the set query algorithm to get an / log n rather than approximation. This lets us recover a -sparse x with O( log n measurements with x x ( + Err (x,. log n This is especially interesting in the common regime where < n c for some constant c >. Then no previous algorithms achieve better than a ( + approximation with O( log n measurements, and the lower bound in [DIPW] shows that any O( approximation requires Ω( log n measurements 5. This means at Θ( log n measurements, the best approximation changes from ω( to + o(. Another application is that of finding bloc-sparse approximations. In this application, the coordinate set {... n} is partitioned into n/b blocs, each of length b. We define a (, b-bloc-sparse vector to be a vector where all non-zero elements are contained in at most /b blocs. An example of bloc-sparse data is time series data from n/b locations over b time steps, where only /b locations are active. We can define Err (x,, b = min x ˆx. (,b bloc-sparse ˆx The bloc-sparse recovery problem can now be formulated analogously to Equation. Since the formulation imposes restrictions on the sparsity patterns, it is natural to expect that one can perform sparse recovery from fewer than O( log(n/ measurements needed in the general case. Because of that reason and the prevalence of approximately bloc-sparse signals, the problem of stable recovery of variants of bloc-sparse approximations has been recently a subject of extensive research (e.g., see [EB9, SPH9, BCDH, CIHB9]. The state of the art algorithm has been given in [BCDH], who gave a probabilistic construction of a single m n matrix A, with m = O(+ b log n, and an n logo( n-time algorithm for performing the bloc-sparse recovery in the l norm (as well as other variants. If the blocs have size Ω(log n, the algorithm uses only O( measurements, which is a 5 The lower bound only applies to geometric distributions, not Zipfian ones. However, our algorithm applies to more general sub- Zipfian distributions (defined in Section 4., which includes both. substantial improvement over the general bound. However, the approximation factor C guaranteed by that algorithm was super-constant. In this paper, we provide a distribution over matrices A, with m = O( + b log n, which enables solving this problem with a constant approximation factor and in the l norm, with high probability. As with Zipfian distributions, first one algorithm tells us where to find the heavy hitters and then the set query algorithm estimates their values. In this case, we modify the algorithm of [ABI8] to find bloc heavy hitters, which enables us to find the support of the b most significant blocs using O( b log n measurements. The essence is to perform dimensionality reduction of each bloc from b to O(log n dimensions, then estimate the result with a linear hash table. For each bloc, most of the projections are estimated pretty well, so the median is a good estimator of the bloc s norm. Once the support is identified, we can recover the coefficients using the set query algorithm. Preliminaries. Notation For n Z +, we denote {,..., n} by [n]. Suppose x R n. Then for i [n], x i R denotes the value of the i- th coordinate in x. As an exception, e i R n denotes the elementary unit vector with a one at position i. For S [n], x S denotes the vector x R n given by x i = x i if i S, and x i = otherwise. We use supp(x to denote the support of x. We use upper case letters to denote sets, matrices, and random distributions. We use lower case letters for scalars and vectors.. Negative Association This paper would lie to mae a claim of the form We have observations each of whose error has small expectation and variance. Therefore the average error is small with high probability in. If the errors were independent this would be immediate from Chebyshev s inequality, but our errors depend on each other. Fortunately, our errors have some tendency to behave even better than if they were independent: the more noise that appears in one coordinate, the less remains to land in other coordinates. We use negative dependence to refer to this general class of behavior. The specific forms of negative dependence we use are negative association and approximate negative correlation; see Appendix A for details on these notions. 3

4 3 Set-Query Algorithm Theorem 3.. There is a randomized sparse binary setch matrix A and recovery algorithm A, such that for any x R n, S [n] with S =, x = A (Ax + ν, S R n has supp(x S and x x S ( x x S + ν with probability at least / c. A has O( c rows and O(c non-zero entries per column, and A runs in O(c time. One can achieve x x S ( x x S + ν under the same conditions, but with only O( c rows. We will first show Theorem 3. for a constant c = /3 rather than for general c. Parallel repetition gives the theorem for general c, as described in Section 3.7. We will also only show it with entries of A being in {,, }. By splitting each row in two, one for the positive and one for the negative entries, we get a binary matrix with the same properties. The paper focuses on the more difficult l result; see Appendix B for details on the l result. 3. Intuition We call x S the head and x x S the tail. The head probably contains the heavy hitters, with much more mass than the tail of the distribution. We would lie to estimate x S with zero error from the head and small error from the tail with high probability. Our algorithm is related to the standard Count- Setch [CCF] and Count-Min [CM4] algorithms. In order to point out the differences, let us examine how they perform on this tas. These algorithms show that hashing into a single w = O( sized hash table is good in the sense that each point x i has:. Zero error from the head with constant probability (namely w.. A small amount of error from the tail in expectation (and hence with constant probability. They then iterate this procedure d times and tae the median, so that each estimate has small error with probability Ω(d. With d = O(log, we get that all estimates in S are good with O( log measurements with high probability in. With fewer measurements, however, some x i will probably have error from the head. If the head is much larger than the tail (such as when the tail is zero, this is a major problem. Furthermore, with O( measurements the error from the tail would be small only in expectation, not with high probability. We mae three observations that allow us to use only O( measurements to estimate x S with error relative to the tail with high probability in.. The total error from the tail over a support of size is concentrated more strongly than the error at a single point: the error probability drops as Ω(d rather than Ω(d.. The error from the head can be avoided if one nows where the head is, by modifying the recovery algorithm. 3. The error from the tail remains concentrated after modifying the recovery algorithm. For simplicity this paper does not directly show (, only ( and (3. The modification to the algorithm to achieve ( is quite natural, and described in detail and illustrated in Section 3.. Rather than estimate every coordinate in S immediately, we only estimate those coordinates which mostly do not overlap with other coordinates in S. In particular, we only estimate x i as the median of at least d positions that are not in the image of S \ {i}. Once we learn x i, we can subtract Ax i e i from the observed Ax and repeat on A(x x i e i and S \ {i}. Because we only loo at positions that are in the image of only one remaining element of S, this avoids any error from the head. We show in Section 3.3 that this algorithm never gets stuc; we can always find some position that mostly doesn t overlap with the image of the rest of the remaining support. We then show that the error from the tail has low expectation, and that it is strongly concentrated. We thin of the tail as noise located in each cell (coordinate in the image space. We decompose the error of our result into two parts: the point error and the propagation. The point error is error introduced in our estimate of some x i based on noise in the cells that we estimate x i from, and equals the median of the noise in those cells. The propagation is the error that comes from point error in estimating other coordinates in the same connected component; these errors propagate through the component as we subtract off incorrect estimates of each x i. Section 3.4 shows how to decompose the total error in terms of point errors and the component sizes. The two following sections bound the expectation and variance of these two quantities and show that they obey some notions of negative dependence. We combine these errors in Section 3.7 to get Theorem 3. with a specific c (namely c = /3. We then use parallel repetition to achieve Theorem 3. for arbitrary c. 3. Algorithm We describe the setch matrix A and recovery procedure in Algorithm 3.. Unlie Count-Setch [CCF] or Count-Min [CM4], our A is not split into d hash tables 4

5 Figure : An instance of the set query problem. There are n vertices on the left, corresponding to x, and the table on the right represents Ax. Each vertex i on the left maps to d cells on the right, randomly increasing or decreasing the value in each cell by x i. We represent addition by blac lines, and subtraction by red lines. We are told the locations of the heavy hitters, which we represent by blue circles; the rest is represented by yellow circles. of size O(. Instead, it has a single w = O(d / sized hash table where each coordinate is hashed into d unique positions. We can thin of A as a random d-uniform hypergraph, where the non-zero entries in each column correspond to the terminals of a hyperedge. We say that A is drawn from G d (w, n with random signs associated with each (hyperedge, terminal pair. We do this so we will be able to apply existing theorems on random hypergraphs. Figure shows an example Ax for a given x, and Figure demonstrates running the recovery procedure on this instance. Lemma 3.. Algorithm 3. runs in time O(d. Proof. A has d entries per column. For each of the at most d rows q in the image of S, we can store the preimages P (q. We also eep trac of the sets of possible next hyperedges, J i = {j L j d i} for i {, }. We can compute these in an initial pass in O(d. Then in each iteration, we remove an element j J or J and update x j, b, and T in O(d time. We then loo at the two or fewer non-isolated vertices q in hyperedge j, and remove j from the associated P (q. If this maes P (q =, we chec whether to insert the element in P (q into the J i. Hence the inner loop taes O(d time, for O(d total. 6.5 (a (c (b Figure : Example run of the algorithm. Part (a shows the state as considered by the algorithm: Ax and the graph structure corresponding to the given support. In part (b, the algorithm chooses a hyperedge with at least d isolated vertices and estimates the value as the median of those isolated vertices multiplied by the sign of the corresponding edge. In part (c, the image of the first vertex has been removed from Ax and we repeat on the smaller graph. We continue until the entire support has been estimated, as in part (d. 3 (d 3.3 Exact Recovery The random hypergraph G d (w, of random d-uniform hyperedges on w vertices is well studied in [K L]. We use their results to show that the algorithm successfully 5

6 Definition of setch matrix A. For a constant d, let A be a w n = O( d n matrix where each column is chosen independently uniformly at random over all exactly d-sparse columns with entries in {,, }. We can thin of A as the incidence matrix of a random d-uniform hypergraph with random signs. Recovery procedure. : procedure SetQuery(A, S, b Recover approximation x to x S from b = Ax + ν : T S 3: while T > do 4: Define P (q = {j A qj, j T } as the set of hyperedges in T that contain q. 5: Define L j = {q A qj, P (q = } as the set of isolated vertices in hyperedge j. 6: Choose a random j T such that L j d. If this is not possible, find a random j T such that L j d. If neither is possible, abort. 7: x j median q L j A qj b q 8: b b x j Ae j 9: T T \ {j} : end while : return x : end procedure Algorithm 3.: Recovering a signal given its support. terminates with high probability, and that most hyperedges are chosen with at least d isolated vertices: Lemma 3.. With probability at least O(/, Algorithm 3. terminates without aborting. Furthermore, in each component at most one hyperedge is chosen with only d isolated vertices. We will show this by building up a couple lemmas. We define a connected hypergraph H with r vertices on s hyperedges to be a hypertree if r = s(d + and to be unicyclic if r = s(d. Then Theorem 4 of [K L] shows that, if the graph is sufficiently sparse, G d (w, is probably composed entirely of hypertrees and unicyclic components. The precise statement is as follows 6 : Lemma 3.3 (Theorem 4 of [K L]. Let m = w/d(d. Then with probability O(d 5 w /m 3, G d (w, is composed entirely of hypertrees and unicyclic components. We use a simple consequence: Corollary 3.. If d = O( and w d(d, then with probability O(/, G d (w, is composed entirely of hypertrees and unicyclic 6 Their statement of the theorem is slightly different. This is the last equation in their proof of the theorem. We now prove some basic facts about hypertrees and unicyclic components: Lemma 3.4. Every hypertree has a hyperedge incident on at least d isolated vertices. Every unicyclic component either has a hyperedge incident on d isolated vertices or has a hyperedge incident on d isolated vertices, the removal of which turns the unicyclic component into a hypertree. Proof. Let H be a connected component of s hyperedges and r vertices. If H is a hypertree, r = (d s +. Because H has only ds total (hyperedge, incident vertex pairs, at most (s of these pairs can involve vertices that appear in two or more hyperedges. Thus at least one of the s edges is incident on at most one vertex that is not isolated, so some edge has d isolated vertices. If H is unicyclic, r = (d s and so at most s of the (hyperedge, incident vertex pairs involve non-isolated vertices. Therefore on average, each edge has d isolated vertices. If no edge is incident on at least d isolated vertices, every edge must be incident on exactly d isolated vertices. In that case, each edge is incident on exactly two non-isolated vertices and each non-isolated vertex is in exactly two edges. Hence we can perform an Eulerian tour of all the edges, so removing any edge does not disconnect the graph. After removing the edge, the graph has s = s edges and r = r d + vertices; therefore r = (d s + so the graph is a hypertree. Corollary 3. and Lemma 3.4 combine to show Lemma Total error in terms of point error and component size Define C i,j to be the event that hyperedges i and j are in the same component, and D i = j C i,j to be the number of hyperedges in the same component as i. Define L i to be the cells that are used to estimate i; so L i = {q A qj, P (q = } at the round of the algorithm when i is estimated. Define Y i = median q Li A qi (b Ax S q to be the point error for hyperedge i, and x to be the output of the algorithm. Then the deviation of the output at any coordinate i is at most twice the sum of the point errors in the component containing i: Lemma 3.5. (x x S i j S Y j C i,j. Proof. Let T i = (x x S i, and define R i = {j j i, q L i s.t. A qj } to be the set of hyperedges that 6

7 overlap with the cells used to estimate i. Then from the description of the algorithm, it follows that T i = median A qi ((b Ax S q q L i j T i Y i + T j. j R i A qj T j We can thin of the R i as a directed acyclic graph (DAG, where there is an edge from j to i if j R i. Then if p(i, j is the number of paths from i to j, T i j p(j, i Y i. Let r(i = {j i R j } be the outdegree of the DAG. Because the L i are disjoint, r(i d L i. From Lemma 3., r(i for all but one hyperedge in the component, and r(i for that one. Hence p(i, j for any i and j, giving the result. We use the following corollary: Corollary 3.. Proof. x x S = i S x x S 4 i S 4 i S D i Y i (x x S i 4 i S D i j S C i,j= Y j = 4 i S ( Y j j S C i,j= D i Y i where the second inequality is the power means inequality. The D j and Y j are independent from each other, since one depends only on A over S and one only on A over [n] \ S. Therefore we can analyze them separately; the next two sections show bounds and negative dependence results for Y j and D j, respectively. 3.5 Bound on point error Recall from Section 3.4 that based entirely on the set S and the columns of A corresponding to S, we can identify the positions L i used to estimate x i. We then defined the point error Y i = median q L i A qi (b Ax S q = median A qi (A(x x S +ν q q L i and showed how to relate the total error to the point error. Here we would lie to show that the Y i have bounded moments and are negatively dependent. Unfortunately, it turns out that the Y i are not negatively associated so it is unclear how to show negative dependence directly. Instead, we will define some other variables Z i that are always larger than the corresponding Y i. We will then show that the Z i have bounded moments and negative association. We use the term NA throughout the proof to denote negative association. For the definition of negative association and relevant properties, see Appendix A. Lemma 3.6. Suppose d 7 and define µ = O( ( x x S + ν. There exist random variables Z i such that the variables Yi are stochastically dominated by Z i, the Z i are negatively associated, E[Z i ] = µ, and E[Zi ] = O(µ. Proof. The choice of the L i depends only on the values of A over S; hence conditioned on nowing L i we still have A(x x S distributed randomly over the space. Furthermore the distribution of A and the reconstruction algorithm are invariant under permutation, so we can pretend that ν is permuted randomly before being added to Ax. Define B i,q to be the event that q supp(ae i, and define D i,q {, } independently at random. Then define the random variable V q = (b Ax S q = ν q + x i B i,q D i,q. i [n]\s Because we want to show concentration of measure, we would lie to show negative association (NA of the Y i = median q Li A qi V q. We now ν is a permutation distribution, so it is NA [JP83]. The B i,q for each i as a function of q are chosen from a Fermi-Dirac model, so they are NA [DR96]. The B i,q for different i are independent, so all the B i,q variables are NA. Unfortunately, the D i,q can be negative, which means the V q are not necessarily NA. Instead we will find some NA variables that dominate the V q. We do this by considering V q as a distribution over D. Let W q = E D [Vq ] = νq + i [n]\s x i B i,q. As increasing functions of NA variables, the W q are NA. By Marov s inequality Pr D [Vq cw q ] c, so after choosing the B i,q and as a distribution over D, Vq is dominated by the random variable U q = W q F q where F q is, independently for each q, given by the p.d.f. f(c = /c for c and f(c = otherwise. Because the distribution of V q over D is independent for each q, the U q jointly dominate the Vq. The U q are the componentwise product of the W q with independent positive random variables, so they too are NA. Then define Z i = median q L i U q. 7

8 As an increasing function of disjoint subsets of NA variables, the Z i are NA. We also have that Y i = (median A qi V q (median V q q L i q L i = median Vq q L i median U q = Z i q L i so the Z i stochastically dominate Yi. We now will bound E[Zi ]. Define µ = E[W q ] = E[νq ] + x i E[B i,q ] Then we have i [n]\s = d w x x S + w ν ( x x S + ν. Pr[W q cµ] c Pr[U q cµ] = c f(x Pr[W q cµ/x]dx x x c dx + c + ln c dx = x c Because the U q are NA, they satisfy marginal probability bounds [DR96]: Pr[U q t q, q [w]] Pr[U q t q ] for any t q. Therefore ( Pr[Z i cµ] i [n] T L i q T T = L i / ( + ln c Li c ( Pr[Z i cµ] 4 + ln c c P r[u q cµ] d/ Li / If d 7, this maes E[Z i ] = O(µ and E[Z i ] = O(µ. 3.6 Bound on component size Lemma 3.7. Let D i be the number of hyperedges in the same component as hyperedge i. Then for any i j, Cov(D i, D j = E[D i D j ] E[D i ] O( log6. Proof. The intuition is that if one component gets larger, other components tend to get smaller. Also the graph is very sparse, so component size is geometrically distributed. There is a small probability that i and j are connected, in which case D i and D j are positively correlated, but otherwise D i and D j should be negatively correlated. However analyzing this directly is rather difficult, because as one component gets larger, the remaining components have a lower average size but higher variance. Our analysis instead taes a detour through the hypergraph where each hyperedge is piced independently with a probability that gives the same expected number of hyperedges. This distribution is easier to analyze, and only differs in a relatively small Õ( hyperedges from our actual distribution. This allows us to move between the regimes with only a loss of Õ(, giving our result. Suppose instead of choosing our hypergraph from G d (w, we chose it from G d (w, ; that is, each hyperedge appeared independently with the appropriate prob- ( w d ability to get hyperedges in expectation. This model is somewhat simpler, and yields a very similar hypergraph G. One can then modify G by adding or removing an appropriate number of random hyperedges I to get exactly hyperedges, forming a uniform G G d (w,. By the Chernoff bound, I O( log with probability. Ω( Let D i be the size of the component containing i in G, and H i = Di D i. Let E denote the event that any of the D i or D i is more than C log, or that more than C log hyperedges lie in I, for some constant C. Then E happens with probability less than for some C, so 5 it has negligible influence on E[Di D j ]. Hence the rest of this proof will assume E does not happen. Therefore H i = if none of the O( log random hyperedges in I touch the O(log hyperedges in the components containing i in G, so H i = with probability at least O( log. Even if H i, we still have H i (Di + D j O(log. Also, we show that the D i are negatively correlated, when conditioned on being in separate components. Let D(n, p denote the distribution of the component size of a random hyperedge on G d (n, p, where p is the probability an hyperedge appears. Then D(n, p dominates D(n, p whenever n > n the latter hypergraph is contained within the former. If C i,j is the event that i and j are connected in G, this means E[D i D j = t, C i,j = ] Furthermore, E[D i ] = O( and E[D4 i ] = O(. is a decreasing function in t, so we have negative corre- 8

9 lation: E[D i D j C i,j = ] E[D i C i,j = ] E[D j C i,j = ] E[D i ] E[D j]. Furthermore for i j, Pr[C i,j = ] = E[C i,j ] = l i E[C i,l] = E[Di] = O(/. Hence E[D i D j] = E[D i D j C i,j = ] Pr[C i,j = ]+ Therefore E[D i D j ] E[D i D j C i,j = ] Pr[C i,j = ] E[D i ] E[D j] + O( log4. = E[(D i + H i (D j + H j ] = E[D i D j] + E[H i D j] + E[H i H j ] E[D i ] E[D j] + O( log = E[D i H i ] + O( log6 log 4 + log log = E[D i ] E[H i ] E[D i ] + E[H i ] + O( log6 E[D i ] + O( log6 Now to bound E[Di 4 ] in expectation. Because our hypergraph is exceedingly sparse, the size of a component can be bounded by a branching process that dies out with constant probability at each step. Using this method, Equations 7 and 7 of [COMS7] state that Pr[D ] e Ω(. Hence E[D i ] = O( and E[D 4 i ] = O(. Because H i is with high probability and O(log otherwise, this immediately gives E[Di ] = O( and E[Di 4] = O(. 3.7 Wrapping it up Recall from Corollary 3. that our total error x x S 4 i Y i D i 4 i Z i D i. The previous sections show that Z i and Di each have small expectation and covariance. This allows us to apply Chebyshev s inequality to concentrate 4 i Z idi about its expectation, bounding x x S with high probability: Lemma 3.8. We can recover x from Ax + ν and S with x x S ( x x S + ν with probability at least in O( recovery time. c /3 Our A has O( c rows and sparsity O( per column. Proof. Our total error is x x S 4 i Y i D i 4 i Then by Lemma 3.6 and Lemma 3.7, E[4 i Z i D i ] = 4 i Z i D i. E[Z i ] E[D i ] = µ where µ = O( ( x x S + ν. Furthermore, E[( i Var( i Z i D i ] = i = i i E[Z i D 4 i ] + i j E[Z i Z j D i D j ] E[Z i ] E[D 4 i ] + i j E[Z i Z j ] E[D i D j ] O(µ + i j E[Z i ] E[Z j ](E[D i ] + O( log6 = O(µ log 6 + ( E[Z i D i ] Z i D i = E[( i Z i D i ] E[Z i D i ] O(µ log 6 By Chebyshev s inequality, this means Pr[4 i Z i D i ( + cµ] O( log6 c Pr[ x x S ( + cc ( x x S + ν ] O( c /3 for some constant C. Rescaling down by C( + c, we can get x x S ( x x S + ν with probability at least c /3 : Now we shall go from /3 probability of error to c error for arbitrary c, with O(c multiplicative cost in time and space. We simply perform Lemma 3.8 O(c times in parallel, and output the pointwise median of the results. By a standard parallel repetition argument, this gives our main result: Theorem 3.. We can recover x from Ax + ν and S with x x S ( x x S + ν with probability at least in O(c recovery time. c Our A has O( c rows and sparsity O(c per column. 9

10 Proof. Lemma 3.8 gives an algorithm that achieves O( /3 probability of error. We will show here how to achieve c probability of error with a linear cost in c, via a standard parallel repetition argument. Suppose our algorithm gives an x such that x x S µ with probability at least p, and that we run this algorithm m times independently in parallel to get output vectors x,..., x m. We output y given by y i = median j [m] (x j i, and claim that with high probability y x S µ 3. Let J = {j [m] x j x S µ}. Each j [m] lies in J with probability at least p, so the chance that J 3m/4 is less than ( m m/4 p m/4 (4ep m/4. Suppose that J 3m/4. Then for all i S, {j J (x j i y i } J m J /3 and similarly {j J (x j i y i } J /3. Hence for all i S, y i x i is smaller than at least J /3 of the (x j i x i for j J. Hence or J µ i S j J((x j i x i i S = J 3 y x y x 3µ J 3 (y i x i with probability at least (4ep m/4. Using Lemma 3.8 to get p = and µ = 6 /3 ( x x S + ν, with m = c repetitions we get Theorem Applications We give two applications where the set query algorithm is a useful primitive. 4. Heavy Hitters of sub-zipfian distributions For a vector x, let r i be the index of the ith largest element, so x ri is non-increasing in i. We say that x is Zipfian with parameter α if x ri = Θ( x r i α. We say that x is sub-zipfian with parameters (, α if there exists a non-increasing function f with x ri = Θ(f(ii α for all i. A Zipfian with parameter α is a sub-zipfian with parameter (, α for all, using f(i = x r. The Zipfian heavy hitters problem is, given a linear setch Ax of a Zipfian x and a parameter, to find a -sparse x with minimal x x (up to some approximation factor. We require that x be -sparse (and no more because we want to find the heavy hitters themselves, not to find them as a proxy for approximating x. Zipfian distributions are common in real-world data sets, and finding heavy hitters is one of the most important problems in data streams. Therefore this is a very natural problem to try to improve; indeed, the original paper on Count-Setch discussed it [CCF]. They show a result complementary to our wor, namely that one can find the support efficiently: Lemma 4. (Section 4. of [CCF]. If x is sub-zipfian with parameter (, α and α > /, one can recover a candidate support set S with S = O( from Ax such that {r,..., r } S. A has O( log n rows and recovery succeeds with high probability in n. Proof setch. Let S = {r,..., r }. With O( log n measurements, Count-Setch identifies each x i to within x x S with high probability. If α > /, this is less than x r /3 for appropriate. But x r9 x r /3. Hence only the largest 9 elements of x could be estimated as larger than anything in x S, so the locations of the largest 9 estimated values must contain S. It is observed in [CCF] that a two-pass algorithm could identify the heavy hitters exactly. However, with a single pass, no better method has been nown for Zipfian distributions than for arbitrary distributions; in fact, the lower bound [DIPW] on linear sparse recovery uses a geometric (and hence sub-zipfian distribution. As discussed in [CCF], using Count-Setch 7 with O( log n rows gets a -sparse x with x x r x ( + Err (x, = Θ( / α. α where, as in Section, Err (x, = min ˆx x. -sparse ˆx The set query algorithm lets us improve from a + approximation to a + o( approximation. This is not useful for approximating x, since increasing is much more effective than decreasing. Instead, it is useful for finding elements that are quite close to being the actual heavy hitters of x. Naïve application of the set query algorithm to the output set of Lemma 4. would only get a close O(-sparse vector, not a -sparse vector. To get a -sparse vector, we must show a lemma that generalizes one used in the proof of sparse recovery of Count-Setch (first in [CM6], but our description is more similar to [GI]. 7 Another analysis ([CM5] uses Count-Min to achieve a better polynomial dependence on, but at the cost of using the l norm. Our result is an improvement over this as well.

11 Lemma 4.. Let x, x R n. Let S and S be the locations of the largest elements (in magnitude of x and x, respectively. Then if (* (x x S S Err (x,, for, we have x S x ( + 3Err (x,. Previous proofs have shown the following weaer form: Corollary 4.. If we change the condition (* to x x Err (x,, the same result holds. The corollary is immediate from Lemma 4. and (x x S S S S (x x S S. Therefore xs\s xs \S E(E + E ( + E 5E. Plugging into Equation 3, and using (x x S E, x S x E + 5E + xs \S + x[n]\(s S 6E + x[n]\s = ( + 6E x S x ( + 3E. Proof of Lemma 4.. We have (3 x S x = (x x S + xs\s + x[n]\(s S The tricy bit is to bound the middle term x S\S. We will show that it is not much larger than xs \S. Let d = S \ S, and let a be the d-dimensional vector corresponding to the absolute values of the coefficients of x over S \ S. That is, if S \ S = {j,..., j d }, then a i = x ji for i [d]. Let a be analogous for x over S \ S, and let b and b be analogous for x and x over S \ S, respectively. Let E = Err (x, = x x S. We have xs\s xs \S = a b = (a b (a + b a b a + b a b ( b + a b a b (E + a b So we should bound a b. We now that p q p q for all p and q, so a a + b b (x x S\S + (x x S \S (x x S S E. We also now that a b and b a both contain all nonnegative coefficients. Hence a b a b + b a ( a a + b b a a + b b E a b E. With this lemma in hand, on Zipfian distributions we can get a -sparse x with a +o( approximation factor. Theorem 4.. Suppose x comes from a sub-zipfian distribution with parameter α > /. Then we can recover a -sparse x from Ax with x x log n Err (x,. with O( c log n rows and O(n log n recovery time, with probability at least c. Proof. By Lemma 4. we can identify a set S of size O( that contains all the heavy hitters. We then run the set query algorithm of Theorem 3. with 3 substituted log n for. This gives an ˆx with ˆx x S 3 log n Err (x,. Let x contain the largest coefficients of ˆx. By Lemma 4. we have x x ( + Err (x,. log n 4. Bloc-sparse vectors In this section we consider the problem of finding blocsparse approximations. In this case, the coordinate set {... n} is partitioned into n/b blocs, each of length b. We define a (, b-bloc-sparse vector to be a vector where all non-zero elements are contained in at most /b blocs. That is, we partition {,..., n} into T i = {(i b +,..., ib}. A vector x is (, b-bloc-sparse if there exist S,..., S /b {T,..., T n/b } with supp(x S i. Define Err (x,, b = min x ˆx. (,b bloc-sparse ˆx

12 Finding the support of bloc-sparse vectors is closely related to finding bloc heavy hitters, which is studied for the l norm in [ABI8]. The idea is to perform dimensionality reduction of each bloc into log n dimensions, then perform sparse recovery on the resulting log n b - sparse vector. The differences from previous wor are minor, so we relegate the details to Appendix C. Lemma 4.3. For any b and, there exists a family of matrices A with O( 5 b log n rows and column sparsity O( log n such that we can recover a support S from Ax in O( n b log n time with x x S ( + Err (x,, b with probability at least n Ω(. Once we now a good support S, we can run Algorithm 3. to estimate x S : Theorem 4.. For any b and, there exists a family of binary matrices A with O( + 5 b log n rows such that we can recover a (, b-bloc-sparse x in O( + n b log n time with x x ( + Err (x,, b with probability at least Ω(. Proof. Let S be the result of Lemma 4.3 with approximation /3, so x x S ( + 3 Err (x,, b. Then the set query algorithm on x and S uses O(/ rows to return an x with Therefore as desired. x x S 3 x x S. x x x x S + x x S ( + 3 x x S ( + 3 Err (x,, b ( + Err (x,, b If the bloc size b is at least log n and is constant, this gives an optimal bound of O( rows. 5 Conclusion and Future Wor We show efficient recovery of vectors conforming to Zipfian or bloc sparse models, but leave open extending this to other models. Our framewor decomposes the tas into first locating the heavy hitters and then estimating them, and our set query algorithm is an efficient general solution for estimating the heavy hitters once found. The remaining tas is to efficiently locate heavy hitters in other models. Our analysis assumes that the columns of A are fully independent. It would be valuable to reduce the independence needed, and hence the space required to store A. We show -sparse recovery of Zipfian distributions with + o( approximation in O( log n space. Can the o( be made smaller, or a lower bound shown, for this problem? Acnowledgments I would lie to than my advisor Piotr Indy for much helpful advice, Anna Gilbert for some preliminary discussions, and Joseph O Roure for pointing me to [K L]. References [ABI8] A. Andoni, K. Do Ba, and P. Indy. Bloc heavy hitters. MIT Technical Report TR- 8-4, 8. [BCDH] R. G. Baraniu, V. Cevher, M. F. Duarte, and C. Hegde. Model-based compressive sensing. IEEE Transactions on Information Theory, 56, No. 4:98,. [BKM + ] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomins, and J. Wiener. Graph structure in the web. Comput. Netw., 33(-6:39 3,. [CCF] M. Chariar, K. Chen, and M. Farach- Colton. Finding frequent items in data streams. ICALP,. [CD4] [Cev8] Z. Chen and J. J. Dongarra. Condition numbers of gaussian random matrices. SIAM Journal on Matrix Analysis and Applications, 7:63 6, 4. V. Cevher. Learning with compressible priors. In NIPS, Vancouver, B.C., Canada, 7 December 8. [CIHB9] V. Cevher, P. Indy, C. Hegde, and R. G. Baraniu. Recovery of clustered sparse signals from compressive measurements. SAMPTA, 9.

13 [CM4] G. Cormode and S. Muthurishnan. Improved data stream summaries: The countmin setch and its applications. Latin, 4. [CM5] Graham Cormode and S. Muthurishnan. Summarizing and mining sewed data streams. In SDM, 5. [CM6] G. Cormode and S. Muthurishnan. Combinatorial algorithms for compressed sensing. Sirocco, 6. [COMS7] A. Coja-Oghlan, C. Moore, and V. Sanwalani. Counting connected graphs and hypergraphs via the probabilistic method. Random Struct. Algorithms, 3(3:88 39, 7. [CRT6] [CT6] E. J. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59(8:8 3, 6. E.J. Candès and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? Information Theory, IEEE Transactions on, 5(: , dec. 6. [EG7] D. Eppstein and M. T. Goodrich. Spaceefficient straggler identification in round-trip data streams via Newton s identitities and invertible Bloom filters. WADS, 7. [EKB9] [FPRU] [GI] Y. C. Eldar, P. Kuppinger, and H. Bölcsei. Compressed sensing of bloc-sparse signals: Uncertainty relations and efficient recovery. CoRR, abs/96.373, 9. S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The Gelfand widths of lp-balls for < p. preprint,. A. Gilbert and P. Indy. Sparse recovery using sparse matrices. Proceedings of IEEE,. [GLPS9] A. C. Gilbert, Y. Li, E. Porat, and M. J. Strauss. Approximate sparse recovery: Optimizing time and measurements. CoRR, abs/9.9, 9. [Ind7] P. Indy. Setching, streaming and sublinearspace algorithms. Graduate course notes, available at http: // stellar. mit. edu/ S/ course/ 6/ fa7/ /, 7. [DDT + 8] M. Duarte, M. Davenport, D. Tahar, J. Lasa, T. Sun, K. Kelly, and R. Baraniu. Single-pixel imaging via compressive sampling. IEEE Signal Processing Magazine, 8. [DIPW] K. Do Ba, P. Indy, E. Price, and D. Woodruff. Lower bounds for sparse recovery. SODA,. [Don6] D. L. Donoho. Compressed Sensing. IEEE Trans. Info. Theory, 5(4:89 36, Apr. 6. [JP83] [K L] [LD5] K. Joag-Dev and F. Proschan. Negative association of random variables with applications. The Annals of Statistics, (:86 95, 983. M. Karońsi and T. Lucza. The phase transition in a random hypergraph. J. Comput. Appl. Math., 4(:5 35,. C. La and M. N. Do. Signal reconstruction using sparse tree representation. In in Proc. Wavelets XI at SPIE Optics and Photonics, 5. [DPR96] [DR96] [EB9] D. Dubhashi, V. Priebe, and D. Ranjan. Negative dependence through the FKG inequality. In Research Report MPI-I-96--, Max-Planc-Institut fur Informati, Saarbrucen, 996. D. Dubhashi and D. Ranjan. Balls and bins: A study in negative dependence. Random Structures & Algorithms, 3:99 4, 996. Y.C. Eldar and H. Bolcsei. Bloc-sparsity: Coherence and efficient recovery. IEEE Int. Conf. Acoustics, Speech and Signal Processing, 9. [Mit4] M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, :6 5, 4. [Mut3] S. Muthurishnan. Data streams: Algorithms and applications (invited tal at SODA 3. Available at http: // athos. rutgers. edu/ \ sim muthu/ stream--. ps, 3. [Rom9] J. Romberg. Compressive sampling by random convolution. SIAM Journal on Imaging Science, 9. 3

14 [SPH9] A M. Stojnic, F. Parvaresh, and B. Hassibi. On the reconstruction of bloc-sparse signals with an optimal number of measurements. IEEE Trans. Signal Processing, 9. Negative Dependence Negative dependence is a fairly common property in balls-and-bins types of problems, and can often cleanly be analyzed using the framewor of negative association ([DR96, DPR96, JP83]. Definition (Negative Association. Let (X,..., X n be a vector of random variables. Then (X,..., X n are negatively associated if for every two disjoint index sets, I, J [n], E[f(X i, i Ig(X j, j J] E[f(X i, i I]E[g(X j, j J] for all functions f : R I R and g : R J R that are both non-decreasing or both non-increasing. If random variables are negatively associated then one can apply most standard concentration of measure arguments, such as Chebyshev s inequality and the Chernoff bound. This means it is a fairly strong property, which maes it hard to prove directly. What maes it so useful is that it remains true under two composition rules: Lemma A. ([DR96], Proposition 7.. If (X,..., X n and (Y,..., Y m are each negatively associated and mutually independent, then (X,..., X n, Y,..., Y m is negatively associated.. Suppose (X,..., X n is negatively associated. Let I,..., I [n] be disjoint index sets, for some positive integer. For j [], let h j : R Ij R be functions that are all non-decreasing or all nonincreasing, and define Y j = h j (X i, i I j. Then (Y,..., Y is also negatively associated. Lemma A. allows us to relatively easily show that one component of our error (the point error is negatively associated without performing any computation. Unfortunately, the other component of our error (the component size is not easily built up by repeated applications of Lemma A. 8. Therefore we show something much weaer for this error, namely approximate negative correlation: E[X i X j ] E[X i ]E[X j ] Ω( E[X i] E[X j ] 8 This paper considers the component size of each hyperedge, which clearly is not negatively associated: if one hyperedge is in a component of size than so is every other hyperedge. But one can consider variants that just consider the distribution of component sizes, which seems plausibly negatively associated. However, this is hard to prove. for all i j. This is still strong enough to use Chebyshev s inequality. B Set Query in the l norm This section wors through all the changes to prove the set query algorithm wors in the l norm with w = O( measurements. We use Lemma 3.5 to get an l analog of Corollary 3.: (4 x x S = i S i S (x x S i C i,j Y j = D i Y i. j S i S Then we bound the expectation, variance, and covariance of D i and Y i. The bound on D i wors the same as in Section 3.6: E[D i ] = O(, E[Di ] = O(, E[D i D j ] E[D i ] O(log 4 /. The bound on Y i is slightly different. We define U q = ν q + x i B i,q i [n]\s and observe that U q V q, and U q is NA. Hence is NA, and Y i Z i. Define then Z i = median U q q L i µ = E[U q] = d w x x S + w ν ( x x S + ν Pr[Z i cµ] Li ( c Li / ( d 4 c so E[Z i ] = O(µ and E[Z i ] = O(µ. Now we will show the analog of Section 3.7. We now x x S i D i Z i and E[ D i Z i] = E[D i ] E[Z i] = µ i i for some µ = O( ( x x S + ν. Then E[( D i Z i ] = i Var( i i E[D i ] E[Z i ] + i j E[D i D j ] E[Z iz j] O(µ + i j (E[D i ] + O(log 4 / E[Z i] = O(µ log 4 + ( E[D i Z i] Z id i O(µ log 4. 4

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

Notes on Discrete Probability

Notes on Discrete Probability Columbia University Handout 3 W4231: Analysis of Algorithms September 21, 1999 Professor Luca Trevisan Notes on Discrete Probability The following notes cover, mostly without proofs, the basic notions

More information

Improved Concentration Bounds for Count-Sketch

Improved Concentration Bounds for Count-Sketch Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Asymptotically optimal induced universal graphs

Asymptotically optimal induced universal graphs Asymptotically optimal induced universal graphs Noga Alon Abstract We prove that the minimum number of vertices of a graph that contains every graph on vertices as an induced subgraph is (1+o(1))2 ( 1)/2.

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Explicit Constructions for Compressed Sensing of Sparse Signals

Explicit Constructions for Compressed Sensing of Sparse Signals Explicit Constructions for Compressed Sensing of Sparse Signals Piotr Indyk MIT July 12, 2007 1 Introduction Over the recent years, a new approach for obtaining a succinct approximate representation of

More information

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

Part 1: Hashing and Its Many Applications

Part 1: Hashing and Its Many Applications 1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random

More information

The Probabilistic Method

The Probabilistic Method Lecture 3: Tail bounds, Probabilistic Method Today we will see what is nown as the probabilistic method for showing the existence of combinatorial objects. We also review basic concentration inequalities.

More information

Lecture 2: Streaming Algorithms

Lecture 2: Streaming Algorithms CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

On (ε, k)-min-wise independent permutations

On (ε, k)-min-wise independent permutations On ε, -min-wise independent permutations Noga Alon nogaa@post.tau.ac.il Toshiya Itoh titoh@dac.gsic.titech.ac.jp Tatsuya Nagatani Nagatani.Tatsuya@aj.MitsubishiElectric.co.jp Abstract A family of permutations

More information

Lecture 14 October 22

Lecture 14 October 22 EE 2: Coding for Digital Communication & Beyond Fall 203 Lecture 4 October 22 Lecturer: Prof. Anant Sahai Scribe: Jingyan Wang This lecture covers: LT Code Ideal Soliton Distribution 4. Introduction So

More information

Asymptotically optimal induced universal graphs

Asymptotically optimal induced universal graphs Asymptotically optimal induced universal graphs Noga Alon Abstract We prove that the minimum number of vertices of a graph that contains every graph on vertices as an induced subgraph is (1 + o(1))2 (

More information

Compressed Sensing and Linear Codes over Real Numbers

Compressed Sensing and Linear Codes over Real Numbers Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

Lecture 16 Oct. 26, 2017

Lecture 16 Oct. 26, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that

More information

Sparse recovery using sparse random matrices

Sparse recovery using sparse random matrices Sparse recovery using sparse random matrices Radu Berinde MIT texel@mit.edu Piotr Indyk MIT indyk@mit.edu April 26, 2008 Abstract We consider the approximate sparse recovery problem, where the goal is

More information

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah 00 AIM Workshop on Ranking LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION By Srikanth Jagabathula Devavrat Shah Interest is in recovering distribution over the space of permutations over n elements

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Theorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies,

Theorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies, Math 16A Notes, Wee 6 Scribe: Jesse Benavides Disclaimer: These notes are not nearly as polished (and quite possibly not nearly as correct) as a published paper. Please use them at your own ris. 1. Ramsey

More information

Exponential decay of reconstruction error from binary measurements of sparse signals

Exponential decay of reconstruction error from binary measurements of sparse signals Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation

More information

AN INTRODUCTION TO COMPRESSIVE SENSING

AN INTRODUCTION TO COMPRESSIVE SENSING AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology,

More information

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch

More information

Quasi-regression for heritability

Quasi-regression for heritability Quasi-regression for heritability Art B. Owen Stanford University March 01 Abstract We show in an idealized model that the narrow sense (linear heritability from d autosomal SNPs can be estimated without

More information

Lecture 4: Probability and Discrete Random Variables

Lecture 4: Probability and Discrete Random Variables Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

Guess & Check Codes for Deletions, Insertions, and Synchronization

Guess & Check Codes for Deletions, Insertions, and Synchronization Guess & Chec Codes for Deletions, Insertions, and Synchronization Serge Kas Hanna, Salim El Rouayheb ECE Department, IIT, Chicago sashann@hawiitedu, salim@iitedu Abstract We consider the problem of constructing

More information

List coloring hypergraphs

List coloring hypergraphs List coloring hypergraphs Penny Haxell Jacques Verstraete Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada pehaxell@uwaterloo.ca Department of Mathematics University

More information

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit arxiv:0707.4203v2 [math.na] 14 Aug 2007 Deanna Needell Department of Mathematics University of California,

More information

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 190, Great ideas in algorithms: Pairwise independent hash functions CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required

More information

Bipartite decomposition of random graphs

Bipartite decomposition of random graphs Bipartite decomposition of random graphs Noga Alon Abstract For a graph G = (V, E, let τ(g denote the minimum number of pairwise edge disjoint complete bipartite subgraphs of G so that each edge of G belongs

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Lower Bounds for Testing Bipartiteness in Dense Graphs

Lower Bounds for Testing Bipartiteness in Dense Graphs Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

The 2-valued case of makespan minimization with assignment constraints

The 2-valued case of makespan minimization with assignment constraints The 2-valued case of maespan minimization with assignment constraints Stavros G. Kolliopoulos Yannis Moysoglou Abstract We consider the following special case of minimizing maespan. A set of jobs J and

More information

Maintaining Significant Stream Statistics over Sliding Windows

Maintaining Significant Stream Statistics over Sliding Windows Maintaining Significant Stream Statistics over Sliding Windows L.K. Lee H.F. Ting Abstract In this paper, we introduce the Significant One Counting problem. Let ε and θ be respectively some user-specified

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Differential Analysis II Lectures 22-24

Differential Analysis II Lectures 22-24 Lecture : 3 April 015 18.156 Differential Analysis II Lectures - Instructor: Larry Guth Trans.: Kevin Sacel.1 Precursor to Schrödinger Equation Suppose we are trying to find u(, t) with u(, 0) = f() satisfying

More information

Heavy Hitters. Piotr Indyk MIT. Lecture 4

Heavy Hitters. Piotr Indyk MIT. Lecture 4 Heavy Hitters Piotr Indyk MIT Last Few Lectures Recap (last few lectures) Update a vector x Maintain a linear sketch Can compute L p norm of x (in zillion different ways) Questions: Can we do anything

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) 12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.

More information

Santa Claus Schedules Jobs on Unrelated Machines

Santa Claus Schedules Jobs on Unrelated Machines Santa Claus Schedules Jobs on Unrelated Machines Ola Svensson (osven@kth.se) Royal Institute of Technology - KTH Stockholm, Sweden March 22, 2011 arxiv:1011.1168v2 [cs.ds] 21 Mar 2011 Abstract One of the

More information

Trace Reconstruction Revisited

Trace Reconstruction Revisited Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, and Sofya Vorotnikova 1 1 University of Massachusetts Amherst {mcgregor,svorotni}@cs.umass.edu 2 IBM Almaden Research Center ecprice@mit.edu

More information

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation (approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup:

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME)

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME) WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME ITAI BENJAMINI, GADY KOZMA, LÁSZLÓ LOVÁSZ, DAN ROMIK, AND GÁBOR TARDOS Abstract. We observe returns of a simple random wal on a finite graph to a fixed node,

More information

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes Item Type text; Proceedings Authors Jagiello, Kristin M. Publisher International Foundation for Telemetering Journal International Telemetering

More information

Data Sparse Matrix Computation - Lecture 20

Data Sparse Matrix Computation - Lecture 20 Data Sparse Matrix Computation - Lecture 20 Yao Cheng, Dongping Qi, Tianyi Shi November 9, 207 Contents Introduction 2 Theorems on Sparsity 2. Example: A = [Φ Ψ]......................... 2.2 General Matrix

More information

Approximate counting: count-min data structure. Problem definition

Approximate counting: count-min data structure. Problem definition Approximate counting: count-min data structure G. Cormode and S. Muthukrishhan: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55 (2005) 58-75. Problem

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations Exact Topology Identification of arge-scale Interconnected Dynamical Systems from Compressive Observations Borhan M Sanandaji, Tyrone Vincent, and Michael B Wakin Abstract In this paper, we consider the

More information

1 Adjacency matrix and eigenvalues

1 Adjacency matrix and eigenvalues CSC 5170: Theory of Computational Complexity Lecture 7 The Chinese University of Hong Kong 1 March 2010 Our objective of study today is the random walk algorithm for deciding if two vertices in an undirected

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Fast Compressive Phase Retrieval

Fast Compressive Phase Retrieval Fast Compressive Phase Retrieval Aditya Viswanathan Department of Mathematics Michigan State University East Lansing, Michigan 4884 Mar Iwen Department of Mathematics and Department of ECE Michigan State

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

A New Estimate of Restricted Isometry Constants for Sparse Solutions

A New Estimate of Restricted Isometry Constants for Sparse Solutions A New Estimate of Restricted Isometry Constants for Sparse Solutions Ming-Jun Lai and Louis Y. Liu January 12, 211 Abstract We show that as long as the restricted isometry constant δ 2k < 1/2, there exist

More information

Irredundant Families of Subcubes

Irredundant Families of Subcubes Irredundant Families of Subcubes David Ellis January 2010 Abstract We consider the problem of finding the maximum possible size of a family of -dimensional subcubes of the n-cube {0, 1} n, none of which

More information

Sparse Fourier Transform (lecture 1)

Sparse Fourier Transform (lecture 1) 1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n

More information

On Row-by-Row Coding for 2-D Constraints

On Row-by-Row Coding for 2-D Constraints On Row-by-Row Coding for 2-D Constraints Ido Tal Tuvi Etzion Ron M. Roth Computer Science Department, Technion, Haifa 32000, Israel. Email: {idotal, etzion, ronny}@cs.technion.ac.il Abstract A constant-rate

More information

Lecture 16 Oct 21, 2014

Lecture 16 Oct 21, 2014 CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve

More information

CS168: The Modern Algorithmic Toolbox Lecture #13: Compressive Sensing

CS168: The Modern Algorithmic Toolbox Lecture #13: Compressive Sensing CS168: The Modern Algorithmic Toolbox Lecture #13: Compressive Sensing Tim Roughgarden & Gregory Valiant May 11, 2015 1 Sparsity Data analysis is only interesting when the data has structure there s not

More information

2 Generating Functions

2 Generating Functions 2 Generating Functions In this part of the course, we re going to introduce algebraic methods for counting and proving combinatorial identities. This is often greatly advantageous over the method of finding

More information

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Thresholds for the Recovery of Sparse Solutions via L1 Minimization Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu

More information

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003 CS6999 Probabilistic Methods in Integer Programming Randomized Rounding April 2003 Overview 2 Background Randomized Rounding Handling Feasibility Derandomization Advanced Techniques Integer Programming

More information

Welsh s problem on the number of bases of matroids

Welsh s problem on the number of bases of matroids Welsh s problem on the number of bases of matroids Edward S. T. Fan 1 and Tony W. H. Wong 2 1 Department of Mathematics, California Institute of Technology 2 Department of Mathematics, Kutztown University

More information

Single-tree GMM training

Single-tree GMM training Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably

More information

Geometry of log-concave Ensembles of random matrices

Geometry of log-concave Ensembles of random matrices Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann

More information

Exact Reconstruction Conditions and Error Bounds for Regularized Modified Basis Pursuit (Reg-Modified-BP)

Exact Reconstruction Conditions and Error Bounds for Regularized Modified Basis Pursuit (Reg-Modified-BP) 1 Exact Reconstruction Conditions and Error Bounds for Regularized Modified Basis Pursuit (Reg-Modified-BP) Wei Lu and Namrata Vaswani Department of Electrical and Computer Engineering, Iowa State University,

More information

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3784. Ming-Jun Lai Department of Mathematics,

More information

Topic 3 Random variables, expectation, and variance, II

Topic 3 Random variables, expectation, and variance, II CSE 103: Probability and statistics Fall 2010 Topic 3 Random variables, expectation, and variance, II 3.1 Linearity of expectation If you double each value of X, then you also double its average; that

More information

Problem Set 2. Assigned: Mon. November. 23, 2015

Problem Set 2. Assigned: Mon. November. 23, 2015 Pseudorandomness Prof. Salil Vadhan Problem Set 2 Assigned: Mon. November. 23, 2015 Chi-Ning Chou Index Problem Progress 1 SchwartzZippel lemma 1/1 2 Robustness of the model 1/1 3 Zero error versus 1-sided

More information

Tutorial: Sparse Signal Recovery

Tutorial: Sparse Signal Recovery Tutorial: Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan (Sparse) Signal recovery problem signal or population length N k important Φ x = y measurements or tests:

More information

0.1. Linear transformations

0.1. Linear transformations Suggestions for midterm review #3 The repetitoria are usually not complete; I am merely bringing up the points that many people didn t now on the recitations Linear transformations The following mostly

More information

Two added structures in sparse recovery: nonnegativity and disjointedness. Simon Foucart University of Georgia

Two added structures in sparse recovery: nonnegativity and disjointedness. Simon Foucart University of Georgia Two added structures in sparse recovery: nonnegativity and disjointedness Simon Foucart University of Georgia Semester Program on High-Dimensional Approximation ICERM 7 October 2014 Part I: Nonnegative

More information